slop_detector1020 - an ai code detector

slop_detector1020 - an ai code detector Used AI

6 devlogs
10h 15m
Created by divpreet

using AI to learn scikit-learn and more about ML training with python

slop_detector1020 is an ai code detector in its early stages, its made in python and only works for python code right now

Timeline

added JS supprot! got 408 files each of human (from gh) and ai (gemini, gpt 4.1, claude), then i just copy pasted the feature extraction and model training, getting a 96% accuracy! changed the feature extraction to get stuff like arrow to func ratio now!

added a percentage, instead of a blunt AI or human, also added a few gpt5 samples and it seems to be pretty accurate, i plan on adding emoji extraction, so if it uses emojis, which gpt code does tend to, its more likely to be AI.

removed html support, since it was really finnicky, removed dataset and associated apps.

Update attachment

html support is added, scraped 400 human samples from github, getting human samples isnt a problem, its seamless, ai smaples are a problem, got 200 from gemini, and the rest were from chatgpt 4.1,

needed to use AI to get a better result, since if prompted, you can easily pass the detector, even after using AI (and tons of fixing ai code), its still pretty dodgy and isnt that reliable, ig its because of the lack of proper data, and i currently wont be able to solve that without paying for an AI service, so ig, HTML detection would be marked as a rough estimate, and it can be pretty dodgy.

also i fixed the scraper by using github's search api, and then decoding its base64 contents!

Update attachment

increased dataset to 400 files each for AI and human, scraped github for human, and used gemini 1.5 flash, claude and ai.hackclub to generate ai code. used tfidf to get a better result, ended up geting 97% accuracy!

also made a simple prediction script, it uses the same extraction code from the features.py, renames the dictionaries keys to suit the models requirements, and then loads the trained model to get the prediction,

currently only works for python code, but i plan on adding html, css and js

first devlog! today, i got 20 samples each, ai samples using ai hackclub using a simple ai code gen python script i created, and manually got human datasets, also built feature extraction and got it to ouput .csv! in the format - filename,label,lines,blanks,comment ratio,line length,indent variations,functions

also i need to gather larger datasets, perhaps from web scraping on gh and pypi

Update attachment