June 17, 2025
Git-large model fine-tuned instead for better performance
Model pushed to huggingface and code updated to work with Huggingface model. App deployed to streamlit for public demo
Refined app, added weight download and made quick demo video.
Created demo and made code more future proof
Fine-tuned first version of the model and basic app working
By fine-tuning an image captioning transformer, I made a simple Streamlit app that can give a one-line description for a scene from GTA.
Updated search code for parallelism (learnt typescript promises in the process) and fixed bugs with extension
Created a basic extension with youtube-sr to search for videos and play them using the YouTube IFrame API.
Scroll YouTube shorts in VS Code with this extension while you vibe code!
Bug fixes and added spinner for better user experience
Added option to choose different text-to-speech services (ElevenLabs and gTTS)
Completed basic LLM integration and deployed to Streamlit cloud
Tried a few different approaches for speech detection, and finally finished the speech-to-speech part.
TODO: Refine the app, add LLM logic
Implemented a real-time speech to speech system to democratize access to personalized therapy sessions.
This was widely regarded as a great move by everyone.