That is anime Dubbing (or subtitles) with AI support (yes!)
**Important**: The model I use in this project requires a large amount of VRAM and RAM, so you need to have about 6 GB (VRAM) or more
**Important 2**: I RECOMEND YOU USE GPU >= RTX 2050. THE MORE POWERFUL THE GPU YOU HAVE, THE LESS TIME YOU SPEND
ping @nbth for help
No followers yet
Once you ship this you can't edit the description of the project, but you'll be able to add more devlogs and re-ship it as you add new features!
Alpha version here the audio is quite bad, which is not good for a podcast but anime is not. Hopefully i can make that better ( maybe not )
Introducing the stage-based pipeline, which helps you resume when a crash occurs. The new TTS lib added is Edge-TTS, not the best at voice cloning, but that good at clear talking
I decide to use F5-TTS for now ( maybe i will add more ). Some finetuned model of F5 have the CC-BY-NC copyright so careful when you use
I try different approaches. By the way, the first pipeline uses Speaker-Diarization-3.1, but I found that when many people talk at the same time, it poses a significant problem. So I switched to vad with pyannote/speech-separation-ami-1.0 and voice embedding to make that more available. In that case, the new solution will take every object to a depth is 1 (But the loss is so high, hopefully that descrease in next devlog)