June 17, 2025
now you can change the max speed factor ( which affect when tts oversize
I'm happy to tell you that the XTTS here has better tts performance
We have a brand new log
Alpha version here the audio is quite bad, which is not good for a podcast but anime is not. Hopefully i can make that better ( maybe not )
Introducing the stage-based pipeline, which helps you resume when a crash occurs. The new TTS lib added is Edge-TTS, not the best at voice cloning, but that good at clear talking
loved the detailed readme!!
I decide to use F5-TTS for now ( maybe i will add more ). Some finetuned model of F5 have the CC-BY-NC copyright so careful when you use
after many time refactor we can continue to transcript
By some reason the loss is so high, so I decided to use WhisperX to decrease the loss
I try different approaches. By the way, the first pipeline uses Speaker-Diarization-3.1, but I found that when many people talk at the same time, it poses a significant problem. So I switched to vad with pyannote/speech-separation-ami-1.0 and voice embedding to make that more available. In that case, the new solution will take every object to a depth is 1 (But the loss is so high, hopefully that descrease in next devlog)
I hope that lavalink run
We have i8n and lyric searcher ( that not work well) and shuffle and change prefix to l
We have autoplay yeah let go
We have a summary and chat go back ( my credit card is crying, hopefully you don't overuse it, or the https://ai.hackclub.com problem will be solved)
The bot can play music
This is a clone of one prefix many music bots like Jockie. The bot can join a room to play music for you if the main bot is busy, invite the slave bot, and continue chilling with a friend
We have flashcard as the good news
We have this isuess as the bad news https://github.com/hackclub/ai/issues/22 but the pull is some good for this news https://github.com/hackclub/ai/pull/24
i will working for make the chatting come up hope that
I have he quiz collection guy let i try to create something for learn like flashcard
we have the banner yeah
we have markdown let goo yeah
That can look for the emotional transcription maybe that
That is anime Dubbing (or subtitles) with AI support (yes!) **Important**: The model I use in this project requires a large amount of VRAM and RAM, so you need to have about 6 GB (VRAM) or more **Important 2**: I RECOMEND YOU USE GPU >= RTX 2050. THE MORE POWERFUL THE GPU YOU HAVE, THE LESS TIME YOU SPEND ping @nbth for help
I am waiting for this pull merged because I am depending on that
https://github.com/hackclub/ai/pull/21
we have the chat ui let go
Almost done with the backend, now I create the frontend
This is the Ai that you can upload your document to help
We starting with login page
i think that fun if we do that
Pterodactyl clone
This was widely regarded as a great move by everyone.