u-crawler

u-crawler

8 devlogs
7h 1m
Created by obob

u-crawler is a web scraper that utilizes the BeautifulSoup, requests and selenium to collect data about courses and programs from the University of New South Wales.
It adheres to robots.txt and outputs its results into a neatly formatted directory composed of json files.
I created this for the Anansi YSWS and I learnt a lot about web scraping :D

Timeline

I added github actions support so i can build using pyinstaller for MacOS, Windows and Linux!

Update attachment

I added pyinstaller support which builds u-crawler into an exe that is easy to run, with no need to install dependencies or to even have python!

Update attachment

Ship 1

0 payouts of shell 0 shells

obob

about 1 month ago

obob Covers 6 devlogs and 5h 57m

Finished readme

Update attachment

I added error catching and logging in programs.py. Now I just need to write a readme

Update attachment

I implemented error catching and logging in categories.py

Update attachment

I implemented a robots.txt checker to make sure that it doesn't scrape forbidden pages

Update attachment

Now my scraper uses the results of categories.json to crawl every program inside of that category. Instead of simply using requests and beautiful soup like I did for the category scraper, I had to use selenium to launch a headless browser, because the data for the programs is rendered with JavaScript. In the the screenshot below, you can see the format of the results with categories.json and the programs in each category as their own file and with some of the code on the right.

Update attachment

I finished the code that gets all of the areas of interest on the home page and puts it in a json file

Update attachment