<- 7% chance that one is AI -> suprisingly, not alot! I measured the heuristics of text and trained a KMeans model to check if text is AI.
Satyam Raj
Check their projects out: Satyam Hub, Stock Simulator
Chris
Check their projects out: C Calculator, CartCommands Framework, Strings Lite, Strings, NameColor, Enchanted Happy Ghast Harnesses, Karatasi | Command Line Web Browser
Austin's SDK
Check their projects out: WikiBeachia, Lockdown, SoM profile view tracker, Austin's SDK Portfolio, Journly
Youssef
Check their project out: ReeTui
obob
Check their projects out: Markdown Converter for Raycast, shells, spotify-mood, tinypie, mirrored, hackpad, portfolio, Read More, treeboard, pcb keychain, u-crawler, another personal website, Chronotime
Advick
Whoops! Looks like they don't have a project yet. Maybe ask them to start one?
Once you ship this you can't edit the description of the project, but you'll be able to add more devlogs and re-ship it as you add new features!
retrained model and downloaded latest SoM data. I also completely automated my build process, I can run the rebuild.nu
script and it will:
1. download new SoM data
2. retrain the model
3. write the new metrics to the index.html
4. compile sonai-metrics
5. compile sonai for wasm
6. open vite for testing
Everything is 100% reproducible since all RNG is seeded, the same data will give the same WASM, Metrics, and sonai crate.
use the same dist fn across the app; improve fingerprinting lists; fix irr perspective bug where parts of words would count; add the rest of the unicode dashes (bad ones); tune model for a better split, not just 50% each time; fetch new data and retrain; add a new metric display trait for training. bumped versions and crates.io published. I also fixed the date detection ahocorasick.
changed the distance formula, added 4 more model parameters, and retrained a few times. this required a total rework of the previous system. AI% Up to 6!
published new version, dealt with the HC API SSL Outage by adding retries to the fetcher.
I revamped the entire repository structure, added a license, and published the crate for human consumption!
Added more metrics: hashtags and labels. Certian users post devlogs with #NextJS2025 like wtf. Additionally, AI's do things like:
Project description:
Devlog #1:
I added:
These are all catched
I trained a new model! This new iteration is based off of these brand spankin new f64
features: emoji_rate, buzzword_ratio, markdown_use, irregular_ellipsis, rule_of_threes, devlog_day_count, html_escapes, irregular_quotations, irregular_dashes
. I reworked the WASM compat layer, it now outputs a JsValue which is comprised of the aforementioned features, as well as the centeroid distance and certainty for each cluster, this allows the web ui to display how confident the model is that the specified devlog is either AI or Human. I now lazy load the model so the WASM module does not re-decode the binary each time it is run, this is actually not even that much faster since the model is so small. Text metrics are now LIGHTNING FAST, as regex has been completely eliminated from the pipeline, resorting to summing str::matches
and Iter::count()
for a fast text matching algo. Markdown stats now only flag a subset of MD, lists using -
are now allowed. I added a buzzword list like sleek ui, interactive experience, etc. You get the point lol. The model is now trained on project descriptions as well as the original devlogs. I have abstracted the page fetcher into a trait, which saved maybe 15 lines of code :skull:
I'm so fucking proud of this, but I still need to yap about it! I managed to train an AI model which is only 255 bytes big. It classifies a devlog as AI or NOT AI. Cluster 1 means AI , and cluster 2 means NOT AI. The model first measures 14 metrics from every devlog on SoM, then it trains a KMeans model on it, normalizes the vectors, and separates it into 2 distinct categories. The category with the most emojis (common AI metric) is the AI category. I tested this on many examples from real-world devlogs, comparing it to the output of GPTZero, and they matched up! I also got chatgpt to generate a few devlogs, they were all AI! Next, I will try to get this shit to run inside of a browser. Wish me luck! Currently SoM is 13% ai btw.
Working on classifier, so far I have classified a few sets of vibecoded text from a few metrics like emojis em-dashess etc.
Working on embedding all of the devlogs, projects, and users. So far I have successfully embedded every devlog.