Markov Chain Text Generator

30 devlogs

14h 48m

• Ship certified

Created by Ezra Aslan

***DEMO TAKES A MINUTE TO LOAD*** This is a computationally efficient text-generation algorithm. Contrary to LLMs, this lightweight program uses Markov Chains and other rules of probability to analyze relationships between words in inputted text. It uses this to reform sentences and paragraphs in order to generate a new format from the original corpus. What's so revolutionary is that my program runs in constant time by analyzing the relationships between the words (O(1)) but an AI model runs in exponential time by iterating over each word and checking it in a token dictionary against the others (O(n^2)).

Demo

Repository

Readme

Timeline

Ezra Aslan

24m • about 11 hours ago

I got some feedback that people didn't really understand what each of the buttons and labels were talking about so I added in some helpful tooltips to display when hovered. Hopefully this makes the program more accessible for non-computer obsessed people :).

Ship 2

0 payouts of shell 0 shells

1 day ago

Ezra Aslan • Covers 28 devlogs and 13h 16m

Ezra Aslan

34m • 1 day ago

Switched to Custom tkinter and got everything working again. The UI looks A LOT better! I'm gonna fix some spacing bugs tmrw but it's much better overall.

Ezra Aslan

27m • 1 day ago

I wanted this app to look better but after researching I found that I had mostly exhausted the capabilities of the normal tkinter library. I'm gonna add round corners and better colors in my next devlog.

Ezra Aslan

14m • 1 day ago

Fixed some issues with padding and spacing so the parameters line up and scale to fullscreen. Fiddled with the size of the output box and disabled random text wrapping so words aren't cut in half randomly.

Ezra Aslan

25m • 1 day ago

Still have not updated the version or .exe on Github, but I did commit a new version of the code with interactable UI elements. Added a button to copy text after generation, which is bugging me because it's annoyingly close to the bottom of the screen and feels cramped. Hopefully will be fixed soon!

Ezra Aslan

55m • 1 day ago

Spent a while trying to make the UI pretty. I finally got this animation thing to work so the loading labels have ... but it types out -- I think it's pretty cool! I tried to put in a progress bar to update as the program moved along but it didn't really work :(. Will probably update a new .exe to Github with the latest GUI soon.

Ezra Aslan

1h 9m • 2 days ago

Spent a LOT of time coding today and overhauled the tkinter GUI with fonts and labels and other stuff. I'm going to keep making the window look better so it's a nicer GUI tool for users and to make the output relevant to the input, which isn't too much of a problem (because the irrelevant output is still coherent) but would make the UI more useful for people. I also need to implement text wrapping (also not a priority, but will be nice) so that the outputted text is formatted correctly.

Ezra Aslan 2 days ago

This is extra but the irrelevant output is only really a problem with extremely low state sizes like 1, so it shouldn’t be that much of an issue for users because I might make the state size choices 2-4 instead of 1-4.

Ezra Aslan

46m • 2 days ago

Since I want to make this an accessible tool, I decided to package it neatly with tkinter and create a runnable program that displays in its own window rather than the terminal, which is not very user friendly. I read through the tkinter documentation and threw together an unpolished demo (featured in the picture). It's not perfect or pretty yet, but it feels nice and works even better than the terminal version. This tkinter version will be my next release on Github. More to come!

Ezra Aslan

3m • 3 days ago

Latest release (v2.0.0) is OUT ON GITHUB NOW!! This includes all AI updates that I have added in the past week and a fully function text generation program that can run easily on your laptop. The text is completely coherent when parsed with the phi3 model at this point, and is still undetectable to AI and plagiarism checkers. Note: to run the latest version of the program, you must have Ollama installed on your computer if you want to use the available coherence models (phi3 or phi3-mini)

Ezra Aslan

11m • 3 days ago

Updated the coherence model prompt so it adheres more to the type of text I want it to output (e.g., don't change the length of the text). The text still shows 0% on almost all AI checkers and plagiarism checkers.

Ezra Aslan

10m • 3 days ago

I experimented with other Ollama models, namely phi3-mini. I found that this takes about half as much time to parse the text but the output is significantly worse. Because of these tradeoffs, I decided to let the user choose which model would correct their generated text so I added a user input query for them to choose.

Ezra Aslan

6m • 3 days ago

Switched the model run plan to jumpstart the server at the beginning of the program instead of running it once. This takes roughly the same amount of time but it is quicker to generate text later because the model is continuously running in the background.

Ezra Aslan

6m • 3 days ago

Took out LanguageModel API cause it seemed redundant with the final coherence model. After a few tests, it is uneeded but its absense does not significantly increase the program's speed.

Ezra Aslan

12m • 3 days ago

Updated the scraper and website searcher functions to be more optimized and WAY faster. This eliminated a lot of time cause I am now able to parse multiple websites at once. I think that the scraping step is now fast enough, so I am moving on to decreasing time from the Phi-3 checking step (which takes the longest).

Ezra Aslan

6m • 4 days ago

Fixed demo so the text is displayed for longer. The model works very well with Ollama Phi-3, but it requires a user to download Ollama beforehand (which is why the version with this model is not included in the demo). It also can be very slow, so I am working on breaking up the most arduous step of the process (where the AI parses the text) into subprocesses that can be run at the same time on smaller chunks of text.

Ezra Aslan

22m • 4 days ago

Fiddled with the synonym function because it was changing a lot of words, so I decreased the temperature in that function. Experimented with a smaller phi-3 model (phi-3-mini), but it didn't work and wouldn't output any text. I managed to decrease the runtime from the parser model by warming it up at the beginning of the program.

Ezra Aslan

7m • 6 days ago

Updated scraper function to exclude footers or other divs with similar attributes from the scraped text. Updated AI function because it was actually not called in the right part of the main function (oops!). This takes a little longer to load but the final result is GREAT. Although this version of the project is a bit heftier than I had originally planned, my math still shows its asymptotic runtime is significantly less, not to mention the memory usage and RAM usage during runtime. I am worried that the model I am using (although it is the smallest I could find) is mobilizing more capacity than I need for the job of parsing text, which would cause unnecessary CPU usage.

Ezra Aslan 4 days ago

Forgot to mention that the generated text is grammatically correct, mostly coherent, and is undetectable by any AI and plagiarism checkers that I put it through!

Ezra Aslan

12m • 7 days ago

Imported Phi-3 from Ollama to correct text. This works relatively well but it is a bigger package and a user must download Ollama separately. For this reason, I have not created a new .exe for Github with the model included. I might try to make my own model too, because this seems contrary to my original goals.

Ezra Aslan

11m • 8 days ago

Found trouble with scraping DuckDuckGo because it would randomly not return any links so I switched to an engine-specific scraper which works better. Fixed problems with first ship and shipped a real version of the project. Next all my efforts will be focused on a clearer output.

Ezra Aslan

32m • 9 days ago

Updated synonym function to use a better identifier so the words will be more uniform and contextual. I had a problem with random synonyms popping up and ruining the flow of the sentence. Experimented with larger GPT models (too much memory) and other systems for smoothing the final text, but so far they either haven't worked with my Python version or have taken too much CPU power (my goal is still to make this as computationally efficient as possible). I also added some more conditionals to make the output conform to English grammar rules (capitals after punctuations, etc...).

Ezra Aslan

13m • 11 days ago

Created an unusual character clause with the re module to limit the amount of unwanted text being spit out.

Ezra Aslan

35m • 11 days ago

Successfully installed inflect so I switched the pluralize function away from manual using the library to handle transitions. Tried to enhance the corpus by using the newpaper3k library to only scrape pure text but it was buggy so I deleted it. Imported the language tools library to use for grammar checking as well.

Ezra Aslan

17m • 11 days ago

Upgraded source number from 1 to 5. This allows for more diversity and variety within the generated text. Updated scraper function to check for binary or other illegible text symbols and other bug fixes.

Ezra Aslan

14m • 11 days ago

Integrated user input into the scraper search by incorporating DuckDuckGo searches. The search returns a url that is scraped by the scraper and then turned into the corpus for the machine. Next steps will be to refine plurals and synonyms even more and make the output more coherent without plagiarizing.

Ezra Aslan

18m • 11 days ago

Fixed bugs within pluralize function. Created conditionals that check various base cases and adjust the change. I wanted to use the inflect python library for this step but was unable to install it for some reason so I have had to hard code the function. It is less accurate but still fixes most cases.

Ezra Aslan

1h • 12 days ago

Scraper is more efficient due to various tweaks I have made in the length of text and quality of text. Fully implemented synonym detection and replacement through the nltk module. I also check for plurals to match the synonym to the plurality or conjugation of the original word.

Ezra Aslan

13m • 12 days ago

Incorporated beautiful soup scraper into the chain to generate a corpus from the web. Next steps will be to optimize scraper's runtime and expand to keyword search.

Ezra Aslan

1h 14m • 13 days ago

Integrated a web scraper into the design. Planning on using it to search for suitable text for based on user inputted keywords and then use that as the corpus for the generator.

Ezra Aslan

2h 11m • 14 days ago

Created a new version that organizes text into a word chart that displays the probabilities of each word leading to another. Switched function from max word count inputted my user to min word count to give the program more freedom during generation.

Ship 1

1 payout of shell 19.0 shells

15 days ago

Ezra Aslan • Covers 1 devlog and 1h 8m

Ezra Aslan

1h 8m • 15 days ago

Fixed bugs with capital letters and punctuation. Experimented with longer pieces of text (semi-nonsensical but getting there!) from Wikipedia and other sources. A larger corpus produces better results.

Markov Chain Text Generator

Followers

Ship Your Project

Get ready!

Ship Requirements Checklist

Link Verification

Link Status:

Timeline

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment