Just Another Weird Intelligent Entity, or better said J.A.W.I.E. Is an smart AI that will help you with all your questions.
J.A.W.I.E. Uses Kokoro TTS AI to translate the AI's response into a human like speech, for this we use the model af_heart.
Because we are privacy focussed, we have an intend based activation system. Because of this system you don't need to say "Hey JAWIE" and wait for 3 seconds before giving your command.
We used ollama to run the model which answers the questions because its very fast and supports streaming (getting token per token)
No followers yet
Once you ship this you can't edit the description of the project, but you'll be able to add more devlogs and re-ship it as you add new features!
explanation:
Since I saw that my mic had was a little bit too quiet for the transcriber to consistently pick up, I added an automatic gain system which will calculate the current -dB and add a gain until its a preconfigured decibel level (-30 or -40dB)
I tried integrating my own tool system using JSON in the responses, but this was giving issues with the TTS (due to bad stream integration) and the model was often hallucinating actions that didn't exist or not using them at all.
After reading some blog posts (ollama's docs and medium post) I saw that ollama had a package to interact with self-hosted models. This package included a native tools function for model that support it. Here I passed along all the functions that the model can use (currently only getWeather) using the openAI's JSON system (you can also just pass along the functions, but I choose this to give more context)
Half the models I downloaded that supported tools didn't actually use this, don't know yet if this is a issue on my side. Qwen2.5 did use it and had decent assistant answers, so that is the model I am using as of right now. Still testing a few more models.
There were a lot of audio glitches that needed to be debugged in the activation system, in short the conclusion is that I got a bad mic 🤣
To make sure you got the right channel for the voice recognition, I also added a small tkinker GUI where you can debug your microphone.
I have also experimented a lot with different AI's and AI engines to find the best one for our use case. First I used transformers with a big 7B model but this took 2minutes 37seconds to generate a small response. After reading some reddit and blog posts I learned that you shouldn't use transformers and rather use ollama.
I switched over to ollama llama3 model and it gave a response within a second (with streaming on), which is very good for our use case.
I'm currently searching for the best model to run on ollama and integrate this in our flow.
Integrated an AI model to test how good its conversational skills are. Still needs finetuning and access to real world current data. Also integrated a system that will listen to your voice and turn your speech into text so the AI can understand it. This system is integrated with privacy filters so your commands will only be passed on if you show intent to ask the AI anything.
(didn't have anything to showcase, so added a short video of asking the AI a question)