Please sign in to access this page
Cross-platform Software to download #scrape websites locally so you don't have to stay online all the time.
Best for reading documentations and stuff when you don't have internet for long !
Going for a vacation and unsure about the internet ?? Just download (scrape) static/dynamic websites that don't belong on continuous server requests and you are good to go !!!
Mateo Desuasido
Check their projects out: Countdown to School, Control My Aircon, Practice
Once you ship this you can't edit the description of the project, but you'll be able to add more devlogs and re-ship it as you add new features!
Just published the First Release on Github ! Here are the features involved: The first official complete release with GUI + Backend ! It involves the following features:
- 2 Click website downloading
- Offline launching of sites
- History Maintenance to revisit downloads
- Multiple settings control, allowing full access
- Thread Limiting to allow both high/low functionality pc's
- Storage Access for sites
- Visible logs to monitor downloads
- Easy pause/resume even after closing/re-opening app
- Later revisiting of settings for any website downloaded
- Easy cancellation & deletion on a click
- In-app control of opened sites
- Credits Page (Both owner & indirect helpers)
- Custom Mouse cursors for elegant feel
- Purple spider themed assets for giving scraper vibes
These functionalities account for the beautiful Webber application ! Currently, windows version only but Linux and Mac coming soooon...
Got fever and had to leave screening for quite a few days now ! Back to grinding through code ! Fingers crossed, I'll get it shipped today !!
In the clouds rn after just doing some final stuff like alignment and extra info for users because I just compiled it for Windows. It's a success ! Just going to get done with Linux & Max versions and hop onto the ship certifications !
Was just about to get done with the compilation when I got some bugs such as path issues cause some characters supported by browser aren't done locally by folders as names ! Also, adding another feature allowing user to restrict number of threads on a system was quite needed. Just finalizing with some bits and tests and I hope it might be all !!
It is completed ! Yup it is ! Just finalized with small bits like implementing check for essential assets, folders, etc.. to prevent crash, resolve last minute bugs, improvise logging, add credits to all people involved directly/indirectly, adding my own mention, as well as completely getting rid of the system crash issue !!
Balle Balle ! It's working flawlessly and I just pushed Version 1 to Github. Just going to do some final bits (approx 10 hours) so that it turns from a project to a good product. Tonight it's gonna get shipped !
YES ! YES ! YES ! I found the problem. Since i have rarely ever used pygame I was doing the biggest mistake which was to display even static stuff every 60th of a second which I have now changed. It doesn't lag and it's working great. Just doing this overall structural changes gave birth to some bugs which I'll be fixing, I guess hardly 2 more hours and bugs would be resolved, rest 2-3 more features and we could move on to shipping this project.
Going through the toughest decision cause the UI lags ! Both Pygame and the subprocesses make the system to crash. But I noticed that Pygame is the one responsible cause the subprocess runs on console. Will try going for console based UI + subprocess tmrw !!
Spent a good amount of time finalizing stuff to ship ! But stumbled upon a thing, whenever i download a website the gui becomes laggy because the threads are somehow related to the main gui thread causing the fps to drop. Therefore, spent some good time and moving on to a different approach of shifting the downloading,etc.. onto a subprocess ! Hope to get it done by today !! :)
Fixed lots of bugs & also improvised the downloading page along with implementation of pause and cancel backend !
Front Page Improvised ! Added page for log checking and other controls ! Also, connected the backend and frontend.
In love with this Settings Interface ! Mixed my idea with Chatgpt's image generation with my code. ( ChatGpt only created image not code for the interface !)
Speedy BOI ! At once, it took 10 minutes to scrape summer.hackclub.com even at the lowest settings but I have got it shrinked to around 1.5 minutes using threading and better organization. #LESSON LEARNED - Planning carefully at beginning can help a lot :)
A lot of times, a simpler approach is faaar better than complex & lengthy codes. That's what happenned with me and I switched from changing the file contents on every thread while maintaining integrity to storing them at one place and switch at last ! #FASTER DOWNLOADING
Hurray ! Timings in SOM as well as my errors are now fixed
After literally questioning my existence a million times, I finally solved all the bugs. This new approach is fast, really lightening fast but the errors I encountered for it to implement were real pain.
A simple error like not assigning the replaced value back to itself & even difficult ones like the file erasing itself !
Let's go for some testing now !
Best Time Ever Spent !
<br>
I completely changed my way of fetching urls and optimized my code by running on multiple threads.
Here's all I did:-
1) Used yield
rather than return
2) Stopped spending resources on unnecessary checks
3) Implemented simple caching
4) Improvised logging #helpsindebugging
5) Solved like a million bugs
<hr>
Next Mission: Have the UI built & also support for Windows XP (just giving it a try)
Changes input to settings.json so as to allow multiple settings by user.
Also, Found a major bug which if resolved can increase scrape speed by 10x !
Every URL being scraped doesn't call it's children URLs immediately on a separate thread but rather spends time finding each child and then calling them afterwards which leads to unnecessary wait. Also in multiple instances, yield
in python would server better instead of return
.
Off to solving it !
PS: Attached an image of www.google.com
scraped !
Refined the project more and solved some logical errors (toughest) !
Tested and now anyone can go to my github and by following the instructions, can use Webber.
(Application coming soon....)
It's working !
Yep ! Just completed the main functionality with proper scraping capabilities ! Just enter a url and it scrapes it fully based on the settings you provide.
Solved like a million bugs, altered my approach and used object-oriented programming for better organisation along with making use of threads to fasten the process !
TODO: Add best UI (Currently, console only), fasten the scraping using yield
and implement caching.
PS: Attached a video showcasing website scraping for www.example.com
along with running SUMMER OF MAKING
website offline at end !
print(Hello world)
Started with basic fetching techniques using python's library urllib, got threads and tested out my own website which worked perfect. Attached SS for example.com.<br>
Next step is to arrange everything in classes and functions, support better threading and improvise altering URLs.