July 22, 2025
Added Text-to-speech support for Talking Diary. Had to tinker with many libraries including gTTS, pyTTSx3, espeak, and then finally settled to Piper. Although, much of its time wasn't tracked but it sure did take me more than 2 hours. But, yes... here it is - with a cute lady talking to you instead of bland text.
And yes, the modular system kept in mind from the starting did help a lot while adding new features like this. All of this organisation of files and modules does pay off. Also, I have added a new module for voice (input and output) titled as 'voice_io'. It will contain all the required functions for converting speech into text and vice verse. So, you have a little idea what's coming next after this TTS update.
Have a great day!
Nice work!! Really nice theming!
Damn.
I didn't want this to be simple and hence the plot is a little twisted. No offence to summer of making or hackclub, this webpage is just a fictional lore (or not?).
Took me time and I have a little bit more stuff to do such as add sound effects and bgm but it already looks quite good so I'm happy about it. I hope you like it as well.
Something is off... If you're reading this, you found my last project site. Don't stay here. This isn't just a webpage. It's a crime scene. My crime scene. Try not to click on anything. There are whispers! It's looking for its next project, and if you're here, it's already found you.
A simple application of prompting but works beautifully... I recommend you to take advantage of github releases, it is much better than having a demo.md which specifies a folder to get an executable from... Hoping to see more of this soon...
Final blow! This might be my last devlog before shipping the MVP. I have placed before you the final prototype of my product that works fine on Linux. I will be making a windows alternative executable for this as well as host it online as a web app, but what was most convenient for me was to make a Linux app since I myself use Linux Mint as my primary OS.
The demo link (https://github.com/yashnarang000/talkingdiary2/blob/main/demo.md) has all the instructions required to use the dummy version of Talking Diary with the first default personality - Kaya.
Attached screenshot is the proof of how functional the most basic version of Talking Diary (the first personality is named Kaya) is.
Hello, I'm back with another devlog and this one took quite a lot of time - 19 days. And there is, in fact, a reason behind it. My laptop's SSD got corrupted out. Majority of the data was lost, but thank god I had a backup. Also, Talking Diary was safe since it's repository was uploaded on Github. There have been other reasons as well for being late, and I apologise for it.
Now, let us focus on the important stuff. What we have built so far - all the necessary modules for our first prototype are ready and after connecting them, I had figured out that there have been problems that needed me to spend some time, figure their roots out, and fix them up.
So, here is how our first prototype works out -
1. You start main.py
2. Chatloop is initiated
3. You talk to Kaya - a friendly Talking Diary
4. You press esc when you're done expressing yourself
5. Wait until you see done
6. The diary entry is saved in Kaya/output.md
This is the most basic form of our project and it has the core functions. More will be added later - such as TTS and STT support when the MVP will be available to ship.
Pheww! These 6 hours of coding flew like nothing. In the previous devlog, I implemented the chatting function along with the feature to save and fetch from a jsonl file that saves the history of the conversation.
Now, it was time to go way more deep and technical because now it was time to program the modules that handle what happen in the background - journalization and data/log management.
What we still required:
1. A journalizer function - that converts text entries of your conversation with Diary to literal Diary entries that read real.
2. A lot of functions for managing log utility - what we needed to log:
- Session data (timestamp and number of entries per session)
- History of conversation yet to be converted into diary entries (this is different from the universal history)
Also, I upgraded from .json to using .jsonl for better flexibility and efficiency.
Now, in any talking diary instance, you will need these four necessary files:
1. universalhistory.jsonl
2. sessionwisehistory.jsonl
3. tempsessionlog.jsonl
4. session_log.jsonl
The functions are all ready in log_utils.py and journalizer.py.
What awaits is the first prototype, that you will be able to see soon!
Okay. Time for a devlog? Fine.
So as covered in the previous devlog, I created the automation module for unmute.sh website, that is now an available option to be used in the final software.
Moving further, I did some research, built modules that I don't think I ever needed to but still did - and finally deleted. It was more of my test in OOPs than a module I'd ever use, so... all say goodbye to that good for nothing model.
Recreated that module - but this time, no OOPs (trust me, it wasn't necessary, there are so many examples of modules that are famous for their raw functions, classes are optional).
Back to the research - it was on using LLMs in projects without hosting them ourselves. That's how I got to know about Openrouter, an treasure of LLM APIs that you can use in your projects. The limitation - you need to pay. Sure, there are free LLMs, but they have serious rate limitations and to remove them, you need to have atleast $10 of credits in your openrouter account.
For the learning part, Openrouter introduced me to the openai library in Python. It's an amazing module for implementing any LLM in your project, no matter if it is developed by Google or OpenAI. It's real versatile - just change the base URL and the source platform is changed.
I hope it's not too much and I'm doing the explanation right. Nevertheless, let me get this straight: one module, a hundred platforms, a thousand models to use. So, if I'm unsatisfied with the free rate limitation of Openrouter, I can just turn myself around to Gemini by just changing the base URL of the client and model name.
Considering all this, I made a module named api_adapter_openai.py which is just a bunch of functions that help me integrate the openai module in my software. Also, reusable in the future for another project :)
What has come true till now:
Attached is a simple demo of what I did create.
Made the plan and did some initial code today.
The system is modular. Two AIs will be used in the project.
One AI will chat with the user.
The other will work in the background, converting the conversation into daily diary entry.
Four major functions of the Talking Diary:
- Speech To Text
- First AI (understanding)
- Text To Speech (natural and emotional tone)
- Second AI (to convert conversations into entries)
Nominations for first AI:
- Pi AI
- Unmute.sh (just experimenting right now)
Nominations for second AI:
- Gemini
- Any other LLM that could process large number of tokens efficiently
I've never in my life integrated local llms in personal projects, so I might try going that route or even hosting and paying for them on cloud.
And might even work with local opensource TTS and STT models.
For today, I just tried using playwright to make back end modules from websites like 'pi.ai' and 'unmute.sh'.
For making a module for Gemini, its official API can be used.
Ever hit that 2AM spiral like “what even was today?” But you're too tired or just can't be bothered to type it all out? Same. So I am building Talking Diary — a voice-first, emotionally-aware diary that replies to you. You speak or type, it listens, talks back, helps you reflect, and writes your day for you. Built to explore: * Natural, human-like conversations * Letting people vent without overthinking * Designing a friend more than a log keeper A little weird, a little personal, and kinda alive. (Feedbacks are always welcome)
This was widely regarded as a great move by everyone.