Please sign in to access this page

Better Project Explorer

Better Project Explorer

13 devlogs
26h 39m
Created by ryan

Summer of Making has an awful project explorer, it's slow to load, laggy to scroll, and there's no way to search for projects. Unfortunately SOM does not provide a nice api, so I decided to do a little web scraping to make a blazingly fast search tool that uses page ranking algorithms to find relevant projects.

Timeline

stayed up till 12am doing more data analysis. today (technically yesterday), I found every project that is leaking secrets in their .env file. Surprisingly, only about 0.5% of projects leak secrets there. Its mostly database keys but I found som cool ones like a gemini key and an aws key. tomorrow, I will properly report it to staff because everyone is sleeping rn

Update attachment
Jakov C Jakov C 15 days ago
Isn't SoM open source? It says it isn't in your description.

today I did some analysis on the current dataset of projects and found the most popular websites that people put as their repo link. to nobody's surprise, github is by far the most popular, then codeberg, then gitlab, and finally a bunch of self hosters.

Update attachment

Spent a while tuning the core algorithms, it's still really simple right now because I don’t have all the data. Next week I will do another scrape to go from 6k to 9k projects and get a lot more devlogs. Ranking right now is hard because theres almost data enough to do better algorithms. I will try my best to make it hard for people to beat the future algorithms and rank high even if the project is mid.

Update attachment
Eucatastrophe Eucatastrophe 16 days ago
haha that’s really cool :)

Now links to searches can be shared! It was hard to link someone the results of a search, now the query is embedded in the url

Update attachment

Went away for a robotics competition and was expecting to see the website down. I guess the nest team managed to scale up the servers and fix a lot of the issues. Huge props to them!

Update attachment

Wow! Not even a day later and nest is broken again. Eventually I need to switch to a proper host for this because this is getting annoying. A $5/month plan should be more than sufficient; heck, maybe I'll try one of those serverless things.

Update attachment

nest has frequent downtime, so I made a setup script that will hopefully start the website automatically

Update attachment

Got it hosted! Try it out at https://searxing.hackclub.app/ it’s fairly fast and contains over 6k projects. There are still some improvements I can make to the website, but it's nearly complete. Searching is pretty good, and feel free to use the results as inspiration for your own projects.

Update attachment

Doing just a little bit of scraping. Also, I made a custom library to help me with parsing the HTML schema, and after a bit of debugging, it is surprisingly good. Spent some time preventing other assets from loading to not overwhelm Hack Club's servers. They might get mad if I used up a few gigabytes of bandwidth. It took only like 1h 30m to scrape all projects, and it fits in a very small JSON file. Now I need to learn how to write a good search engine and host the website. Yay progress!

Update attachment

Spent way too long deciding on what type of database to use, so I ended up trying a lot of them. Instead of doing the correct thing and watching a YouTube video, reading the docs, or even reading a blog post, I decided to blindly try ones that I have heard of, even if they didn't match my use case. In the end, I ended up with terrible implementations of all of them, like an SQL DB that copies everything to Python for ranking, a Redis DB that is somehow relational, etc., etc. Anyways, instead of actually reading about which to use, I will write my own in Rust that will exactly match my use case. Remember, coding for 6 hours can save you 30 minutes of reading documentation.

Update attachment

Very clean web scraper API. I might also use this to yoink some projects from other hackclub run sites. Figuring out how to get devlogs took some work, but all good now.

Update attachment

Searching mostly works, and the scraper got all the projects into one large JSON. I need to figure out how to implement some kind of ranking algorithm and what to do about the json size being 90% base64 images

Update attachment

Im going to use ai only on the frontend because I suck at webdev and I want this tool to actually be useful. Projects are manually copied from journey.hackclub.com and I was heavily inspired by their style. Currently nothing is interactive but give me a week and I can make this something amazing.

Update attachment