Member-only story

Discover an elegant speech-to-text solution in Python

Kevin Tewouda
9 min readDec 18, 2023

--

Photo by Suzanne D. Williams on Unsplash

Imagine you maintain a podcast platform and you want to transcribe the audio into text, your reflex would be to use an online service like AWS Transcribe? Or do you manage videos like Substack does and what to add subtitles to help disabled users follow these videos? Will you require an online service like AmberScript? What if I tell you it is easier today to build your solution with little resources…

In this article, I will present Whisper, a fantastic tool for transcribing audio files into text. It is an open-source project maintained by OpenAI, the company mostly known for the ChatGPT project. I will also show how to add subtitles to videos so fasten your seatbelts!

Installation

The first pre-requisite is to have the ffmpeg binary. It is the standard tool to play with audio/video files. The installation depends on your platform.

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg

You will also need tools to build binary dependencies. On Linux/Unix systems, having gcc installed will…

--

--

Kevin Tewouda
Kevin Tewouda

Written by Kevin Tewouda

Déserteur camerounais résidant désormais en France. Passionné de programmation, sport, de cinéma et mangas. J’écris en français et en anglais dû à mes origines.

No responses yet