Markov Chain Text Generator

Markov Chain Text Generator

30 devlogs
14h 48m
•  Ship certified
Created by Ezra Aslan

***DEMO TAKES A MINUTE TO LOAD*** This is a computationally efficient text-generation algorithm. Contrary to LLMs, this lightweight program uses Markov Chains and other rules of probability to analyze relationships between words in inputted text. It uses this to reform sentences and paragraphs in order to generate a new format from the original corpus. What's so revolutionary is that my program runs in constant time by analyzing the relationships between the words (O(1)) but an AI model runs in exponential time by iterating over each word and checking it in a token dictionary against the others (O(n^2)).

Timeline

I got some feedback that people didn't really understand what each of the buttons and labels were talking about so I added in some helpful tooltips to display when hovered. Hopefully this makes the program more accessible for non-computer obsessed people :).

Update attachment

Ship 2

0 payouts of shell 0 shells

Ezra Aslan

1 day ago

Ezra Aslan Covers 28 devlogs and 13h 16m

Switched to Custom tkinter and got everything working again. The UI looks A LOT better! I'm gonna fix some spacing bugs tmrw but it's much better overall.

Update attachment

I wanted this app to look better but after researching I found that I had mostly exhausted the capabilities of the normal tkinter library. I'm gonna add round corners and better colors in my next devlog.

Update attachment

Fixed some issues with padding and spacing so the parameters line up and scale to fullscreen. Fiddled with the size of the output box and disabled random text wrapping so words aren't cut in half randomly.

Update attachment

Still have not updated the version or .exe on Github, but I did commit a new version of the code with interactable UI elements. Added a button to copy text after generation, which is bugging me because it's annoyingly close to the bottom of the screen and feels cramped. Hopefully will be fixed soon!

Update attachment

Spent a while trying to make the UI pretty. I finally got this animation thing to work so the loading labels have ... but it types out -- I think it's pretty cool! I tried to put in a progress bar to update as the program moved along but it didn't really work :(. Will probably update a new .exe to Github with the latest GUI soon.

Update attachment

Spent a LOT of time coding today and overhauled the tkinter GUI with fonts and labels and other stuff. I'm going to keep making the window look better so it's a nicer GUI tool for users and to make the output relevant to the input, which isn't too much of a problem (because the irrelevant output is still coherent) but would make the UI more useful for people. I also need to implement text wrapping (also not a priority, but will be nice) so that the outputted text is formatted correctly.

Update attachment
Ezra Aslan Ezra Aslan 2 days ago
This is extra but the irrelevant output is only really a problem with extremely low state sizes like 1, so it shouldn’t be that much of an issue for users because I might make the state size choices 2-4 instead of 1-4.

Since I want to make this an accessible tool, I decided to package it neatly with tkinter and create a runnable program that displays in its own window rather than the terminal, which is not very user friendly. I read through the tkinter documentation and threw together an unpolished demo (featured in the picture). It's not perfect or pretty yet, but it feels nice and works even better than the terminal version. This tkinter version will be my next release on Github. More to come!

Update attachment

Latest release (v2.0.0) is OUT ON GITHUB NOW!! This includes all AI updates that I have added in the past week and a fully function text generation program that can run easily on your laptop. The text is completely coherent when parsed with the phi3 model at this point, and is still undetectable to AI and plagiarism checkers. Note: to run the latest version of the program, you must have Ollama installed on your computer if you want to use the available coherence models (phi3 or phi3-mini)

Update attachment

Updated the coherence model prompt so it adheres more to the type of text I want it to output (e.g., don't change the length of the text). The text still shows 0% on almost all AI checkers and plagiarism checkers.

Update attachment

I experimented with other Ollama models, namely phi3-mini. I found that this takes about half as much time to parse the text but the output is significantly worse. Because of these tradeoffs, I decided to let the user choose which model would correct their generated text so I added a user input query for them to choose.

Update attachment

Switched the model run plan to jumpstart the server at the beginning of the program instead of running it once. This takes roughly the same amount of time but it is quicker to generate text later because the model is continuously running in the background.

Update attachment

Took out LanguageModel API cause it seemed redundant with the final coherence model. After a few tests, it is uneeded but its absense does not significantly increase the program's speed.

Update attachment

Updated the scraper and website searcher functions to be more optimized and WAY faster. This eliminated a lot of time cause I am now able to parse multiple websites at once. I think that the scraping step is now fast enough, so I am moving on to decreasing time from the Phi-3 checking step (which takes the longest).

Update attachment

Fixed demo so the text is displayed for longer. The model works very well with Ollama Phi-3, but it requires a user to download Ollama beforehand (which is why the version with this model is not included in the demo). It also can be very slow, so I am working on breaking up the most arduous step of the process (where the AI parses the text) into subprocesses that can be run at the same time on smaller chunks of text.

Update attachment

Fiddled with the synonym function because it was changing a lot of words, so I decreased the temperature in that function. Experimented with a smaller phi-3 model (phi-3-mini), but it didn't work and wouldn't output any text. I managed to decrease the runtime from the parser model by warming it up at the beginning of the program.

Update attachment

Updated scraper function to exclude footers or other divs with similar attributes from the scraped text. Updated AI function because it was actually not called in the right part of the main function (oops!). This takes a little longer to load but the final result is GREAT. Although this version of the project is a bit heftier than I had originally planned, my math still shows its asymptotic runtime is significantly less, not to mention the memory usage and RAM usage during runtime. I am worried that the model I am using (although it is the smallest I could find) is mobilizing more capacity than I need for the job of parsing text, which would cause unnecessary CPU usage.

Update attachment
Ezra Aslan Ezra Aslan 4 days ago
Forgot to mention that the generated text is grammatically correct, mostly coherent, and is undetectable by any AI and plagiarism checkers that I put it through!

Imported Phi-3 from Ollama to correct text. This works relatively well but it is a bigger package and a user must download Ollama separately. For this reason, I have not created a new .exe for Github with the model included. I might try to make my own model too, because this seems contrary to my original goals.

Update attachment

Found trouble with scraping DuckDuckGo because it would randomly not return any links so I switched to an engine-specific scraper which works better. Fixed problems with first ship and shipped a real version of the project. Next all my efforts will be focused on a clearer output.

Update attachment

Updated synonym function to use a better identifier so the words will be more uniform and contextual. I had a problem with random synonyms popping up and ruining the flow of the sentence. Experimented with larger GPT models (too much memory) and other systems for smoothing the final text, but so far they either haven't worked with my Python version or have taken too much CPU power (my goal is still to make this as computationally efficient as possible). I also added some more conditionals to make the output conform to English grammar rules (capitals after punctuations, etc...).

Update attachment

Created an unusual character clause with the re module to limit the amount of unwanted text being spit out.

Update attachment

Successfully installed inflect so I switched the pluralize function away from manual using the library to handle transitions. Tried to enhance the corpus by using the newpaper3k library to only scrape pure text but it was buggy so I deleted it. Imported the language tools library to use for grammar checking as well.

Update attachment

Upgraded source number from 1 to 5. This allows for more diversity and variety within the generated text. Updated scraper function to check for binary or other illegible text symbols and other bug fixes.

Update attachment

Integrated user input into the scraper search by incorporating DuckDuckGo searches. The search returns a url that is scraped by the scraper and then turned into the corpus for the machine. Next steps will be to refine plurals and synonyms even more and make the output more coherent without plagiarizing.

Update attachment

Fixed bugs within pluralize function. Created conditionals that check various base cases and adjust the change. I wanted to use the inflect python library for this step but was unable to install it for some reason so I have had to hard code the function. It is less accurate but still fixes most cases.

Update attachment

Scraper is more efficient due to various tweaks I have made in the length of text and quality of text. Fully implemented synonym detection and replacement through the nltk module. I also check for plurals to match the synonym to the plurality or conjugation of the original word.

Update attachment

Incorporated beautiful soup scraper into the chain to generate a corpus from the web. Next steps will be to optimize scraper's runtime and expand to keyword search.

Update attachment

Integrated a web scraper into the design. Planning on using it to search for suitable text for based on user inputted keywords and then use that as the corpus for the generator.

Update attachment

Created a new version that organizes text into a word chart that displays the probabilities of each word leading to another. Switched function from max word count inputted my user to min word count to give the program more freedom during generation.

Update attachment

Ship 1

1 payout of shell 19.0 shells

Ezra Aslan

15 days ago

Ezra Aslan Covers 1 devlog and 1h 8m

Fixed bugs with capital letters and punctuation. Experimented with longer pieces of text (semi-nonsensical but getting there!) from Wikipedia and other sources. A larger corpus produces better results.

Update attachment