This user has been banned.

anime dubbing cli tool

anime dubbing cli tool Used AI

10 devlogs
38h 9m
Created by nbth

That is anime Dubbing (or subtitles) with AI support (yes!)
**Important**: The model I use in this project requires a large amount of VRAM and RAM, so you need to have about 6 GB (VRAM) or more
**Important 2**: I RECOMEND YOU USE GPU >= RTX 2050. THE MORE POWERFUL THE GPU YOU HAVE, THE LESS TIME YOU SPEND
ping @nbth for help

Timeline

nbth
nbth
2h 17m 5 months ago

now you can change the max speed factor ( which affect when tts oversize

Update attachment
Earned sticker
nbth
nbth
1h 58m 5 months ago

I'm happy to tell you that the XTTS here has better tts performance

Update attachment

Ship 1

nbth

5 months ago

nbth Covers 8 devlogs and 33h 49m

We have a brand new log

Update attachment
Earned sticker
nbth
nbth
3h 22m 5 months ago

Alpha version here the audio is quite bad, which is not good for a podcast but anime is not. Hopefully i can make that better ( maybe not )

Earned sticker
nbth
nbth
5h 54m 5 months ago

Introducing the stage-based pipeline, which helps you resume when a crash occurs. The new TTS lib added is Edge-TTS, not the best at voice cloning, but that good at clear talking

Update attachment
nbth
nbth
2h 18m 5 months ago

I decide to use F5-TTS for now ( maybe i will add more ). Some finetuned model of F5 have the CC-BY-NC copyright so careful when you use

Update attachment
nbth
nbth
3h 23m 5 months ago

after many time refactor we can continue to transcript

Update attachment
nbth
nbth
2h 35m 6 months ago

By some reason the loss is so high, so I decided to use WhisperX to decrease the loss

Update attachment
nbth
nbth
10h 2m 6 months ago

I try different approaches. By the way, the first pipeline uses Speaker-Diarization-3.1, but I found that when many people talk at the same time, it poses a significant problem. So I switched to vad with pyannote/speech-separation-ami-1.0 and voice embedding to make that more available. In that case, the new solution will take every object to a depth is 1 (But the loss is so high, hopefully that descrease in next devlog)

Update attachment
nbth
nbth
5h 59m 6 months ago

That can look for the emotional transcription maybe that

Update attachment