J.A.W.I.E. - Intelligent AI Jarvis like assistant

J.A.W.I.E. - Intelligent AI Jarvis like assistant Used AI

3 devlogs
14h 9m
Created by Galaxic dev

Just Another Weird Intelligent Entity, or better said J.A.W.I.E. Is an smart AI that will help you with all your questions.
J.A.W.I.E. Uses Kokoro TTS AI to translate the AI's response into a human like speech, for this we use the model af_heart.
Because we are privacy focussed, we have an intend based activation system. Because of this system you don't need to say "Hey JAWIE" and wait for 3 seconds before giving your command.
We used ollama to run the model which answers the questions because its very fast and supports streaming (getting token per token)

Timeline

  • Added automatic gain control to SmartListener
  • Changed the AI interface to use ollama's package instead of using API calls
  • Added a tool system to the AI so it can fetch real time weather information
  • Choose a new AI model (Qwen2.5:7b) because this model supports tools (model might change later)
  • some small other fixes

explanation:
Since I saw that my mic had was a little bit too quiet for the transcriber to consistently pick up, I added an automatic gain system which will calculate the current -dB and add a gain until its a preconfigured decibel level (-30 or -40dB)

I tried integrating my own tool system using JSON in the responses, but this was giving issues with the TTS (due to bad stream integration) and the model was often hallucinating actions that didn't exist or not using them at all.
After reading some blog posts (ollama's docs and medium post) I saw that ollama had a package to interact with self-hosted models. This package included a native tools function for model that support it. Here I passed along all the functions that the model can use (currently only getWeather) using the openAI's JSON system (you can also just pass along the functions, but I choose this to give more context)
Half the models I downloaded that supported tools didn't actually use this, don't know yet if this is a issue on my side. Qwen2.5 did use it and had decent assistant answers, so that is the model I am using as of right now. Still testing a few more models.

Update attachment

There were a lot of audio glitches that needed to be debugged in the activation system, in short the conclusion is that I got a bad mic 🤣
To make sure you got the right channel for the voice recognition, I also added a small tkinker GUI where you can debug your microphone.

I have also experimented a lot with different AI's and AI engines to find the best one for our use case. First I used transformers with a big 7B model but this took 2minutes 37seconds to generate a small response. After reading some reddit and blog posts I learned that you shouldn't use transformers and rather use ollama.
I switched over to ollama llama3 model and it gave a response within a second (with streaming on), which is very good for our use case.
I'm currently searching for the best model to run on ollama and integrate this in our flow.

Update attachment

Integrated an AI model to test how good its conversational skills are. Still needs finetuning and access to real world current data. Also integrated a system that will listen to your voice and turn your speech into text so the AI can understand it. This system is integrated with privacy filters so your commands will only be passed on if you show intent to ask the AI anything.
(didn't have anything to showcase, so added a short video of asking the AI a question)