Please sign in to access this page

Media Toolbox

Media Toolbox Used AI

5 devlogs
12h
Created by spyguy

Website + API powered by ffmpeg, openai-whisper, pillow, and other packages that provides advanced media tools over the web. Features include file type conversion, audio language detection, and audio transcription.

Timeline

Finally was able to make everything into a docker image, and all tests pass in a container running linux!

Update attachment

After doing some research, I was able to include a faster (albeit more limited) implementation of openai-whisper called faster-whisper, based on CTranslate2 and able to use quantization. It also uses less RAM (!!!) and less CPU time. This is under a new endpoint because I have not done enough testing with it, so it is considered a BETA feature for now. I also included tests for this and made the endpoint largely the same except for the exclusion of a language field.

Update attachment

After fiddling around with language support and ensuring that the API does not use a ridiculous amount of resources, I was finally able to improve the transcription feature and write tests for it. It now supports translation (albeit only with the 'tiny' model) and is covered by tests.

Update attachment

After a long fight with ffmpeg, I finally found compatible codecs and ffmpeg args for all conversions that this platform plans to support.

I also wrote 172 tests (most are parametrized) for the API, file handling (upload, download, deletion) and file conversion (images, audio, video). In addition, I moved image conversion over to PIL instead of ffmpeg, improving performance.

TODO: write tests for transcription, implement cleanup of old files

Update attachment

After making some edits to the openai-whisper codebase and submitting a PR, I finally have some core features complete in the API:

  • File upload/download (stored with random uuid)
  • File conversion (powered by ffmpeg, so incredibly versatile)
  • File transcription (powered by openai-whisper, with a custom progress callback function to track progress through the transcription because it takes a while!)
Update attachment