Media Toolbox Used AI

5 devlogs

12h

Created by spyguy

Website + API powered by ffmpeg, openai-whisper, pillow, and other packages that provides advanced media tools over the web. Features include file type conversion, audio language detection, and audio transcription.

Timeline

spyguy

46m • 26 days ago

Finally was able to make everything into a docker image, and all tests pass in a container running linux!

spyguy

58m • 26 days ago

After doing some research, I was able to include a faster (albeit more limited) implementation of openai-whisper called faster-whisper, based on CTranslate2 and able to use quantization. It also uses less RAM (!!!) and less CPU time. This is under a new endpoint because I have not done enough testing with it, so it is considered a BETA feature for now. I also included tests for this and made the endpoint largely the same except for the exclusion of a language field.

spyguy

54m • 26 days ago

After fiddling around with language support and ensuring that the API does not use a ridiculous amount of resources, I was finally able to improve the transcription feature and write tests for it. It now supports translation (albeit only with the 'tiny' model) and is covered by tests.

spyguy

4h 26m • 27 days ago

After a long fight with ffmpeg, I finally found compatible codecs and ffmpeg args for all conversions that this platform plans to support.

I also wrote 172 tests (most are parametrized) for the API, file handling (upload, download, deletion) and file conversion (images, audio, video). In addition, I moved image conversion over to PIL instead of ffmpeg, improving performance.

TODO: write tests for transcription, implement cleanup of old files

spyguy

4h 54m • 28 days ago

After making some edits to the openai-whisper codebase and submitting a PR, I finally have some core features complete in the API:

File upload/download (stored with random uuid)
File conversion (powered by ffmpeg, so incredibly versatile)
File transcription (powered by openai-whisper, with a custom progress callback function to track progress through the transcription because it takes a while!)

Media Toolbox Used AI

Followers

Ship Your Project

Get ready!

Ship Requirements Checklist

Link Verification

Link Status:

Timeline

Add a Comment

Add a Comment

Add a Comment

Add a Comment

Add a Comment