Member-only story

Discover an elegant speech-to-text solution in Python

9 min readDec 18, 2023

Photo by Suzanne D. Williams on Unsplash

Imagine you maintain a podcast platform and you want to transcribe the audio into text, your reflex would be to use an online service like AWS Transcribe? Or do you manage videos like Substack does and what to add subtitles to help disabled users follow these videos? Will you require an online service like AmberScript? What if I tell you it is easier today to build your solution with little resources…

In this article, I will present Whisper, a fantastic tool for transcribing audio files into text. It is an open-source project maintained by OpenAI, the company mostly known for the ChatGPT project. I will also show how to add subtitles to videos so fasten your seatbelts!

Installation

The first pre-requisite is to have the ffmpeg binary. It is the standard tool to play with audio/video files. The installation depends on your platform.

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg

You will also need tools to build binary dependencies. On Linux/Unix systems, having gcc installed will…

Discover an elegant speech-to-text solution in Python

Installation

Written by Kevin Tewouda

No responses yet