***DEMO TAKES A MINUTE TO LOAD*** This is a computationally efficient text-generation algorithm. Contrary to LLMs, this lightweight program uses Markov Chains and other rules of probability to analyze relationships between words in inputted text. It uses this to reform sentences and paragraphs in order to generate a new format from the original corpus. What's so revolutionary is that my program runs in constant time by analyzing the relationships between the words (O(1)) but an AI model runs in exponential time by iterating over each word and checking it in a token dictionary against the others (O(n^2)).
No followers yet
Once you ship this you can't edit the description of the project, but you'll be able to add more devlogs and re-ship it as you add new features!
I got some feedback that people didn't really understand what each of the buttons and labels were talking about so I added in some helpful tooltips to display when hovered. Hopefully this makes the program more accessible for non-computer obsessed people :).
Switched to Custom tkinter and got everything working again. The UI looks A LOT better! I'm gonna fix some spacing bugs tmrw but it's much better overall.
I wanted this app to look better but after researching I found that I had mostly exhausted the capabilities of the normal tkinter library. I'm gonna add round corners and better colors in my next devlog.
Fixed some issues with padding and spacing so the parameters line up and scale to fullscreen. Fiddled with the size of the output box and disabled random text wrapping so words aren't cut in half randomly.
Still have not updated the version or .exe on Github, but I did commit a new version of the code with interactable UI elements. Added a button to copy text after generation, which is bugging me because it's annoyingly close to the bottom of the screen and feels cramped. Hopefully will be fixed soon!
Spent a while trying to make the UI pretty. I finally got this animation thing to work so the loading labels have ... but it types out -- I think it's pretty cool! I tried to put in a progress bar to update as the program moved along but it didn't really work :(. Will probably update a new .exe to Github with the latest GUI soon.
Spent a LOT of time coding today and overhauled the tkinter GUI with fonts and labels and other stuff. I'm going to keep making the window look better so it's a nicer GUI tool for users and to make the output relevant to the input, which isn't too much of a problem (because the irrelevant output is still coherent) but would make the UI more useful for people. I also need to implement text wrapping (also not a priority, but will be nice) so that the outputted text is formatted correctly.
Since I want to make this an accessible tool, I decided to package it neatly with tkinter and create a runnable program that displays in its own window rather than the terminal, which is not very user friendly. I read through the tkinter documentation and threw together an unpolished demo (featured in the picture). It's not perfect or pretty yet, but it feels nice and works even better than the terminal version. This tkinter version will be my next release on Github. More to come!
Latest release (v2.0.0) is OUT ON GITHUB NOW!! This includes all AI updates that I have added in the past week and a fully function text generation program that can run easily on your laptop. The text is completely coherent when parsed with the phi3 model at this point, and is still undetectable to AI and plagiarism checkers. Note: to run the latest version of the program, you must have Ollama installed on your computer if you want to use the available coherence models (phi3 or phi3-mini)
Updated the coherence model prompt so it adheres more to the type of text I want it to output (e.g., don't change the length of the text). The text still shows 0% on almost all AI checkers and plagiarism checkers.
I experimented with other Ollama models, namely phi3-mini. I found that this takes about half as much time to parse the text but the output is significantly worse. Because of these tradeoffs, I decided to let the user choose which model would correct their generated text so I added a user input query for them to choose.
Switched the model run plan to jumpstart the server at the beginning of the program instead of running it once. This takes roughly the same amount of time but it is quicker to generate text later because the model is continuously running in the background.
Took out LanguageModel API cause it seemed redundant with the final coherence model. After a few tests, it is uneeded but its absense does not significantly increase the program's speed.
Updated the scraper and website searcher functions to be more optimized and WAY faster. This eliminated a lot of time cause I am now able to parse multiple websites at once. I think that the scraping step is now fast enough, so I am moving on to decreasing time from the Phi-3 checking step (which takes the longest).
Fixed demo so the text is displayed for longer. The model works very well with Ollama Phi-3, but it requires a user to download Ollama beforehand (which is why the version with this model is not included in the demo). It also can be very slow, so I am working on breaking up the most arduous step of the process (where the AI parses the text) into subprocesses that can be run at the same time on smaller chunks of text.
Fiddled with the synonym function because it was changing a lot of words, so I decreased the temperature in that function. Experimented with a smaller phi-3 model (phi-3-mini), but it didn't work and wouldn't output any text. I managed to decrease the runtime from the parser model by warming it up at the beginning of the program.
Updated scraper function to exclude footers or other divs with similar attributes from the scraped text. Updated AI function because it was actually not called in the right part of the main function (oops!). This takes a little longer to load but the final result is GREAT. Although this version of the project is a bit heftier than I had originally planned, my math still shows its asymptotic runtime is significantly less, not to mention the memory usage and RAM usage during runtime. I am worried that the model I am using (although it is the smallest I could find) is mobilizing more capacity than I need for the job of parsing text, which would cause unnecessary CPU usage.
Imported Phi-3 from Ollama to correct text. This works relatively well but it is a bigger package and a user must download Ollama separately. For this reason, I have not created a new .exe for Github with the model included. I might try to make my own model too, because this seems contrary to my original goals.
Found trouble with scraping DuckDuckGo because it would randomly not return any links so I switched to an engine-specific scraper which works better. Fixed problems with first ship and shipped a real version of the project. Next all my efforts will be focused on a clearer output.
Updated synonym function to use a better identifier so the words will be more uniform and contextual. I had a problem with random synonyms popping up and ruining the flow of the sentence. Experimented with larger GPT models (too much memory) and other systems for smoothing the final text, but so far they either haven't worked with my Python version or have taken too much CPU power (my goal is still to make this as computationally efficient as possible). I also added some more conditionals to make the output conform to English grammar rules (capitals after punctuations, etc...).
Created an unusual character clause with the re module to limit the amount of unwanted text being spit out.
Successfully installed inflect so I switched the pluralize function away from manual using the library to handle transitions. Tried to enhance the corpus by using the newpaper3k library to only scrape pure text but it was buggy so I deleted it. Imported the language tools library to use for grammar checking as well.
Upgraded source number from 1 to 5. This allows for more diversity and variety within the generated text. Updated scraper function to check for binary or other illegible text symbols and other bug fixes.
Integrated user input into the scraper search by incorporating DuckDuckGo searches. The search returns a url that is scraped by the scraper and then turned into the corpus for the machine. Next steps will be to refine plurals and synonyms even more and make the output more coherent without plagiarizing.
Fixed bugs within pluralize function. Created conditionals that check various base cases and adjust the change. I wanted to use the inflect python library for this step but was unable to install it for some reason so I have had to hard code the function. It is less accurate but still fixes most cases.
Scraper is more efficient due to various tweaks I have made in the length of text and quality of text. Fully implemented synonym detection and replacement through the nltk module. I also check for plurals to match the synonym to the plurality or conjugation of the original word.
Incorporated beautiful soup scraper into the chain to generate a corpus from the web. Next steps will be to optimize scraper's runtime and expand to keyword search.
Integrated a web scraper into the design. Planning on using it to search for suitable text for based on user inputted keywords and then use that as the corpus for the generator.
Created a new version that organizes text into a word chart that displays the probabilities of each word leading to another. Switched function from max word count inputted my user to min word count to give the program more freedom during generation.
Fixed bugs with capital letters and punctuation. Experimented with longer pieces of text (semi-nonsensical but getting there!) from Wikipedia and other sources. A larger corpus produces better results.