Searxing - A Better Project Explorer

15 devlogs

41h 43m

• Ship certified

Created by ryan

Summer of Making has an awful project explorer, it's slow to load, laggy to scroll, and there's no way to search for projects. So I decided to do a little web scraping to make a blazingly fast search tool that uses page ranking algorithms to find relevant projects.

Demo

Repository

Timeline

Ship 1

1 payout of shell 848.0 shells

21 days ago

ryan

• Covers 15 devlogs and 39h 48m

ryan

3h 9m • 21 days ago

Finally got embedding search working and running nomic-embed-text:v1.5. I will disable this feature for now because my current host (hackclub's nest program) doesn’t have enough performance to be viable.

ryan

11h 55m • about 1 month ago

Spent a few days refactoring and cleaning up the codebase. I was originally embarrassed to upload this to GitHub, but I think its presentable enough. There are a couple of miscellaneous additions and optimizations. Also, the vibecoded stuff is nearly gone, and the codebase is approaching nearly 2k lines of code

ryan

1h 7m • 3 months ago

stayed up till 12am doing more data analysis. today (technically yesterday), I found every project that is leaking secrets in their .env file. Surprisingly, only about 0.5% of projects leak secrets there. Its mostly database keys but I found som cool ones like a gemini key and an aws key. tomorrow, I will properly report it to staff because everyone is sleeping rn

Jakov 3 months ago

Isn't SoM open source? It says it isn't in your description.

ryan

34m • 3 months ago

today I did some analysis on the current dataset of projects and found the most popular websites that people put as their repo link. to nobody's surprise, github is by far the most popular, then codeberg, then gitlab, and finally a bunch of self hosters.

ryan

8m • 3 months ago

Spent a while tuning the core algorithms, it's still really simple right now because I don’t have all the data. Next week I will do another scrape to go from 6k to 9k projects and get a lot more devlogs. Ranking right now is hard because theres almost data enough to do better algorithms. I will try my best to make it hard for people to beat the future algorithms and rank high even if the project is mid.

Eucatastrophe 3 months ago

haha that’s really cool :)

ryan

18m • 3 months ago

Now links to searches can be shared! It was hard to link someone the results of a search, now the query is embedded in the url

ryan

1h 6m • 3 months ago

Went away for a robotics competition and was expecting to see the website down. I guess the nest team managed to scale up the servers and fix a lot of the issues. Huge props to them!

ryan

20m • 3 months ago

Wow! Not even a day later and nest is broken again. Eventually I need to switch to a proper host for this because this is getting annoying. A $5/month plan should be more than sufficient; heck, maybe I'll try one of those serverless things.

ryan

20m • 3 months ago

nest has frequent downtime, so I made a setup script that will hopefully start the website automatically

ryan

45m • 3 months ago

Got it hosted! Try it out at https://searxing.hackclub.app/ it’s fairly fast and contains over 6k projects. There are still some improvements I can make to the website, but it's nearly complete. Searching is pretty good, and feel free to use the results as inspiration for your own projects.

ryan

6h • 3 months ago

Doing just a little bit of scraping. Also, I made a custom library to help me with parsing the HTML schema, and after a bit of debugging, it is surprisingly good. Spent some time preventing other assets from loading to not overwhelm Hack Club's servers. They might get mad if I used up a few gigabytes of bandwidth. It took only like 1h 30m to scrape all projects, and it fits in a very small JSON file. Now I need to learn how to write a good search engine and host the website. Yay progress!

ryan

5h 51m • 3 months ago

Spent way too long deciding on what type of database to use, so I ended up trying a lot of them. Instead of doing the correct thing and watching a YouTube video, reading the docs, or even reading a blog post, I decided to blindly try ones that I have heard of, even if they didn't match my use case. In the end, I ended up with terrible implementations of all of them, like an SQL DB that copies everything to Python for ranking, a Redis DB that is somehow relational, etc., etc. Anyways, instead of actually reading about which to use, I will write my own in Rust that will exactly match my use case. Remember, coding for 6 hours can save you 30 minutes of reading documentation.

ryan

1h 54m • 4 months ago

Very clean web scraper API. I might also use this to yoink some projects from other hackclub run sites. Figuring out how to get devlogs took some work, but all good now.

ryan

2h 41m • 4 months ago

Searching mostly works, and the scraper got all the projects into one large JSON. I need to figure out how to implement some kind of ranking algorithm and what to do about the json size being 90% base64 images

ryan

5h 30m • 4 months ago

Im going to use ai only on the frontend because I suck at webdev and I want this tool to actually be useful. Projects are manually copied from journey.hackclub.com and I was heavily inspired by their style. Currently nothing is interactive but give me a week and I can make this something amazing.

Searxing - A Better Project Explorer

Followers

Ship Your Project

Get ready!

Ship Requirements Checklist

Link Verification

Link Status:

Timeline

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

nest has frequent downtime, so I made a setup script that will hopefully start the website automatically

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

README