Please sign in to access this page
Website + API powered by ffmpeg, openai-whisper, pillow, and other packages that provides advanced media tools over the web. Features include file type conversion, audio language detection, and audio transcription.
No followers yet
Once you ship this you can't edit the description of the project, but you'll be able to add more devlogs and re-ship it as you add new features!
Finally was able to make everything into a docker image, and all tests pass in a container running linux!
After doing some research, I was able to include a faster (albeit more limited) implementation of openai-whisper called faster-whisper, based on CTranslate2 and able to use quantization. It also uses less RAM (!!!) and less CPU time. This is under a new endpoint because I have not done enough testing with it, so it is considered a BETA feature for now. I also included tests for this and made the endpoint largely the same except for the exclusion of a language field.
After fiddling around with language support and ensuring that the API does not use a ridiculous amount of resources, I was finally able to improve the transcription feature and write tests for it. It now supports translation (albeit only with the 'tiny' model) and is covered by tests.
After a long fight with ffmpeg, I finally found compatible codecs and ffmpeg args for all conversions that this platform plans to support.
I also wrote 172 tests (most are parametrized) for the API, file handling (upload, download, deletion) and file conversion (images, audio, video). In addition, I moved image conversion over to PIL instead of ffmpeg, improving performance.
TODO: write tests for transcription, implement cleanup of old files
After making some edits to the openai-whisper codebase and submitting a PR, I finally have some core features complete in the API: