slop_detector1020 - an ai code detector

slop_detector1020 - an ai code detector Used AI

16 devlogs
17h 35m
•  Ship certified
Created by divpreet

using AI to learn scikit-learn and more about ML training with python
note - it might not be accurate, since the datasets are pretty small (1k each), due to system constraints, and ai rate limits!

slop_detector1020 is an ai code detector made in python

Timeline

Ship 2

1 payout of shell 75.0 shells

divpreet

3 months ago

divpreet Covers 6 devlogs and 3h 59m

end of project! i want to move onto something new lol, also, there were 20 something commits. and i switched from mint to arch, and i was fed up with head ref issues, so i did a force rebase and push, which deleted all the git commits on gh too. so please reviewers dont flag me. now there are 4 commits.

Update attachment

finally added the rust python function, also after switching from mint to arch, i had to force push and delete my git history :( so the repo only says 2 commits, also the flask app is throwing a ssl error, hope its just provisioning!

tried to add rust suopport and this was by far the hardest thing to add, since i've never really programmed in rust, and dk the syntax that well, so had to skim through tons of ai rust code, github samples of rust code, and the rust book, i havent read the whole thing, but have a basic idea of how it now works, also asked the hc slack and got some tips there too!, but i have gotten a model ready and getting a 95% accuracy!

adding rust dataset, also using nvim as my editor now! got human dataset ready, just need to get ai code.

Update attachment

added ts support to the flask app. probably going to add rust or something next lol

Update attachment

adding ts support, gathered the dataset yesterday, and today, i added the feature extraction, also needed to understand a bit of typescript to know what features to extract, after extracting "num_l", "num_b", "comment_r", "avg_l_len", "indent_var", "num_funcs", "arrow_r", "avg_ind_len",
'num_interfaces', 'num_types', 'num_enums', 'num_classes', 'num_imports', 'num_exports',
'type_annotations', 'generics', 'access_mods'

getting a 98% accuracy

Update attachment

Ship 1

1 payout of shell 192.0 shells

divpreet

3 months ago

divpreet Covers 10 devlogs and 13h 35m

ready to ship! finally caddy is done provisioning it, also needed to install scikit-learn on nest, pretty solid mvp imo, plan on adding more langauges like ts support!

Update attachment

was ready to ship the app, so tried deploying, vercel didnt work, tried render, but that too failed, then decided to use nest, and got it setup, but the caddy is still provisioning the ssl certificate.

Update attachment

made a very simple flask front end, right now only js code works, and you can choose any other file. need to fix that.

started working on flask app, create files to keep terminal prediction and flask app seperate. might make a python package, or not really, idk yet

Update attachment

added JS supprot! got 408 files each of human (from gh) and ai (gemini, gpt 4.1, claude), then i just copy pasted the feature extraction and model training, getting a 96% accuracy! changed the feature extraction to get stuff like arrow to func ratio now!

added a percentage, instead of a blunt AI or human, also added a few gpt5 samples and it seems to be pretty accurate, i plan on adding emoji extraction, so if it uses emojis, which gpt code does tend to, its more likely to be AI.

removed html support, since it was really finnicky, removed dataset and associated apps.

Update attachment

html support is added, scraped 400 human samples from github, getting human samples isnt a problem, its seamless, ai smaples are a problem, got 200 from gemini, and the rest were from chatgpt 4.1,

needed to use AI to get a better result, since if prompted, you can easily pass the detector, even after using AI (and tons of fixing ai code), its still pretty dodgy and isnt that reliable, ig its because of the lack of proper data, and i currently wont be able to solve that without paying for an AI service, so ig, HTML detection would be marked as a rough estimate, and it can be pretty dodgy.

also i fixed the scraper by using github's search api, and then decoding its base64 contents!

Update attachment

increased dataset to 400 files each for AI and human, scraped github for human, and used gemini 1.5 flash, claude and ai.hackclub to generate ai code. used tfidf to get a better result, ended up geting 97% accuracy!

also made a simple prediction script, it uses the same extraction code from the features.py, renames the dictionaries keys to suit the models requirements, and then loads the trained model to get the prediction,

currently only works for python code, but i plan on adding html, css and js

first devlog! today, i got 20 samples each, ai samples using ai hackclub using a simple ai code gen python script i created, and manually got human datasets, also built feature extraction and got it to ouput .csv! in the format - filename,label,lines,blanks,comment ratio,line length,indent variations,functions

also i need to gather larger datasets, perhaps from web scraping on gh and pypi

Update attachment