u-crawler is a web scraper that utilizes the BeautifulSoup, requests and selenium to collect data about courses and programs from the University of New South Wales.
It adheres to robots.txt and outputs its results into a neatly formatted directory composed of json files.
I created this for the Anansi YSWS and I learnt a lot about web scraping :D
No followers yet
Once you ship this you can't edit the description of the project, but you'll be able to add more devlogs and re-ship it as you add new features!
I added pyinstaller support which builds u-crawler into an exe that is easy to run, with no need to install dependencies or to even have python!
Now my scraper uses the results of categories.json to crawl every program inside of that category. Instead of simply using requests and beautiful soup like I did for the category scraper, I had to use selenium to launch a headless browser, because the data for the programs is rendered with JavaScript. In the the screenshot below, you can see the format of the results with categories.json and the programs in each category as their own file and with some of the code on the right.
I finished the code that gets all of the areas of interest on the home page and puts it in a json file