github.com/BingLingGroup/autosub ↗
Command-line utility to transcribe/translate from video/audio/subtitles to subtitles
Open this visualization on its own page →
Contributors
29
Lines of Code
1,288
From
2015-06-30
To
2022-04-11
About BingLingGroup/autosub
Autosub is a Python command-line utility for automatically generating subtitles from video, audio, or existing subtitle files. It works by detecting speech regions using audio analysis, splitting the media into segments, transcribing speech using various cloud APIs, and optionally translating the resulting subtitles. The tool supports multiple speech-to-text providers including Google Speech V2, Google Cloud Speech-to-Text, Xfyun, and Baidu ASR APIs.
The project handles the complete workflow of subtitle generation with several preprocessing and post-processing capabilities. It can normalize and filter audio before transcription, automatically detect speech boundaries using the Auditok library, and process audio in parallel to speed up API requests. The tool supports conversion between different audio formats and bitrates depending on API requirements, with FLAC being the default format for Google APIs and PCM for Baidu and Xfyun services. Output can be generated in multiple subtitle formats including SRT, ASS, and VTT.
Notable features include language code detection and conversion to match different API requirements, support for custom speech recognition configs, the ability to work with existing subtitle files for manual region adjustment, and flexible translation options using Google Translate. The project maintains multiple branches with the alpha branch containing the most stable enhanced features, while the dev branch receives the latest updates. It's designed for both Linux and Windows systems with standalone executable releases available, making it accessible to users without Python installation experience.