Member-only story
Discover an elegant speech-to-text solution in Python
Imagine you maintain a podcast platform and you want to transcribe the audio into text, your reflex would be to use an online service like AWS Transcribe? Or do you manage videos like Substack does and what to add subtitles to help disabled users follow these videos? Will you require an online service like AmberScript? What if I tell you it is easier today to build your solution with little resources…
In this article, I will present Whisper, a fantastic tool for transcribing audio files into text. It is an open-source project maintained by OpenAI, the company mostly known for the ChatGPT project. I will also show how to add subtitles to videos so fasten your seatbelts!
Installation
The first pre-requisite is to have the ffmpeg binary. It is the standard tool to play with audio/video files. The installation depends on your platform.
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg
# on Arch Linux
sudo pacman -S ffmpeg
# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg
# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg
# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg
You will also need tools to build binary dependencies. On Linux/Unix systems, having gcc installed will…