how much of som is vibecoded

how much of som is vibecoded

14 devlogs
33h 27m
•  Ship certified
Created by Gangsta Ozcan

<- 7% chance that one is AI -> suprisingly, not alot! I measured the heuristics of text and trained a KMeans model to check if text is AI.

Timeline

Ship 3

0 payouts of shell 0 shells

Gangsta Ozcan

2 days ago

Gangsta Ozcan Covers 5 devlogs and 8h 10m

retrained model and downloaded latest SoM data. I also completely automated my build process, I can run the rebuild.nu script and it will:
1. download new SoM data
2. retrain the model
3. write the new metrics to the index.html
4. compile sonai-metrics
5. compile sonai for wasm
6. open vite for testing

Everything is 100% reproducible since all RNG is seeded, the same data will give the same WASM, Metrics, and sonai crate.

Update attachment

worked on more metrics. NOTE: did not publish yet

Update attachment

use the same dist fn across the app; improve fingerprinting lists; fix irr perspective bug where parts of words would count; add the rest of the unicode dashes (bad ones); tune model for a better split, not just 50% each time; fetch new data and retrain; add a new metric display trait for training. bumped versions and crates.io published. I also fixed the date detection ahocorasick.

Update attachment

Fix the header on the site to represent current stats

Update attachment

changed the distance formula, added 4 more model parameters, and retrained a few times. this required a total rework of the previous system. AI% Up to 6!

Update attachment

Ship 2

1 payout of shell 216.0 shells

Gangsta Ozcan

12 days ago

Gangsta Ozcan Covers 4 devlogs and 11h 23m

published new version, dealt with the HC API SSL Outage by adding retries to the fetcher.

Update attachment

I revamped the entire repository structure, added a license, and published the crate for human consumption!

Update attachment

Added more metrics: hashtags and labels. Certian users post devlogs with #NextJS2025 like wtf. Additionally, AI's do things like:

Project description:
Devlog #1:
I added:

These are all catched

Update attachment

I trained a new model! This new iteration is based off of these brand spankin new f64 features: emoji_rate, buzzword_ratio, markdown_use, irregular_ellipsis, rule_of_threes, devlog_day_count, html_escapes, irregular_quotations, irregular_dashes. I reworked the WASM compat layer, it now outputs a JsValue which is comprised of the aforementioned features, as well as the centeroid distance and certainty for each cluster, this allows the web ui to display how confident the model is that the specified devlog is either AI or Human. I now lazy load the model so the WASM module does not re-decode the binary each time it is run, this is actually not even that much faster since the model is so small. Text metrics are now LIGHTNING FAST, as regex has been completely eliminated from the pipeline, resorting to summing str::matches and Iter::count() for a fast text matching algo. Markdown stats now only flag a subset of MD, lists using - are now allowed. I added a buzzword list like sleek ui, interactive experience, etc. You get the point lol. The model is now trained on project descriptions as well as the original devlogs. I have abstracted the page fetcher into a trait, which saved maybe 15 lines of code :skull:

Update attachment

Ship 1

1 payout of shell 258.0 shells

Gangsta Ozcan

17 days ago

Gangsta Ozcan Covers 5 devlogs and 13h 53m

Add a demo, finished project! Refinements needed.

Update attachment

I'm so fucking proud of this, but I still need to yap about it! I managed to train an AI model which is only 255 bytes big. It classifies a devlog as AI or NOT AI. Cluster 1 means AI , and cluster 2 means NOT AI. The model first measures 14 metrics from every devlog on SoM, then it trains a KMeans model on it, normalizes the vectors, and separates it into 2 distinct categories. The category with the most emojis (common AI metric) is the AI category. I tested this on many examples from real-world devlogs, comparing it to the output of GPTZero, and they matched up! I also got chatgpt to generate a few devlogs, they were all AI! Next, I will try to get this shit to run inside of a browser. Wish me luck! Currently SoM is 13% ai btw.

Update attachment

Uhh, I did the classification.

It's bad...

human=6886, ai=15619, human%=30, ai%=69

Update attachment

Working on classifier, so far I have classified a few sets of vibecoded text from a few metrics like emojis em-dashess etc.

Update attachment

Working on embedding all of the devlogs, projects, and users. So far I have successfully embedded every devlog.

Update attachment