AI Project: Speech-to-Text with Hugging Face (OpenAI Whisper)

ByAhmed Nabil April 10, 2026March 21, 2026

3D isometric illustration of sound waves entering a machine and exiting as 3D text, representing AI speech-to-text conversion.

You’ve used the Hugging Face pipeline to understand text and images. Now, let’s teach it to understand audio using Hugging Face Speech to Text.

We will use OpenAI’s Whisper, a state-of-the-art model for speech recognition, to build a script that can transcribe an audio file into text.

Step 1: Installation

You’ll need transformers, torch, and a library to load audio files.

pip install transformers torch
pip install librosa soundfile

Step 2: The Code

The pipeline handles all the complexity. You just need to point it at an audio file.

from transformers import pipeline
import librosa # We'll use this to load the audio

# 1. Load the pipeline
# This downloads the 'whisper-base' model
transcriber = pipeline(
    "automatic-speech-recognition",
    model="openai/whisper-base"
)

# 2. Load your audio file
# (You'll need your own .wav or .mp3 file for this)
audio_file = "my_speech.wav"
try:
    # Load the audio and get the raw data + sample rate
    speech_data, sample_rate = librosa.load(audio_file, sr=16000)
except FileNotFoundError:
    print(f"Error: '{audio_file}' not found. Please provide an audio file.")
    exit()

# 3. Transcribe!
# Pass the raw data directly to the pipeline
result = transcriber(speech_data)

# 4. Print the result
print("--- Transcription ---")
print(result['text'])

Step 3: The Result

If your my_speech.wav file contained someone saying “Python is the future,” the output will be:

--- Transcription ---
 Python is the future.

You’ve just built a powerful, accurate transcription service in a few lines of Python!

Key Takeaways

Use Hugging Face to transcribe audio files into text with OpenAI’s Whisper model.
Install necessary libraries like transformers and torch for the transcription process.
The pipeline simplifies the task; just specify the audio file to get the transcription result.
For example, transcribing ‘my_speech.wav’ outputs the text ‘Python is the future.’
You’ve successfully created an accurate transcription service using only a few lines of Python!

Ahmed Nabil

Python Engineer and the founder of Python Pro Hub. With a focus on modern data science (Polars), backend architecture (FastAPI/Django), and automation, builds production-grade tutorials designed to take developers from absolute beginners to advanced software engineers.

Data Science | Python Projects
AI Project: Text Generation with gpt-2 (Hugging Face)
ByAhmed Nabil March 25, 2026March 25, 2026
We’ve used the Hugging Face pipeline to understand text (sentiment-analysis) and answer questions (question-answering). Now, let’s use it for its most famous task: Text Generation….
Read More AI Project: Text Generation with gpt-2 (Hugging Face)
Automation | Python Projects
Automate Your Feed: Build a Simple Reddit Bot with PRAW
ByAhmed Nabil March 20, 2026February 4, 2026
PRAW (Python Reddit API Wrapper) is a fantastic library that makes it easy to interact with Reddit. You can use it to build bots that…
Read More Automate Your Feed: Build a Simple Reddit Bot with PRAW
Data Science
Merging DataFrames in Pandas: A Guide to merge() and concat()
ByAhmed Nabil January 23, 2026March 17, 2026
Real-world data is rarely in one single file. You might have sales data in one CSV and customer info in another. You need to combine…
Read More Merging DataFrames in Pandas: A Guide to merge() and concat()
Python Projects
Beginner Python Project: Build a Command-Line To-Do List App
ByAhmed Nabil March 7, 2026February 3, 2026
This classic Python To-Do List Project is one that combines all your fundamental skills: loops, functions, lists, and file handling. We’ll build a simple program…
Read More Beginner Python Project: Build a Command-Line To-Do List App
Data Science
Handling Missing Data in Pandas: A Guide to dropna() and fillna()
ByAhmed Nabil January 16, 2026March 17, 2026
In the real world, your datasets will have holes. Users forget to fill out forms, sensors break, or data gets corrupted. as these Pandas Missing…
Read More Handling Missing Data in Pandas: A Guide to dropna() and fillna()
Data Science
Your First Machine Learning Model: Linear Regression with Scikit-Learn
ByAhmed Nabil January 26, 2026March 17, 2026
Machine Learning (ML) often sounds like magic, but at its core, it is just math. It is about finding patterns in data and using them…
Read More Your First Machine Learning Model: Linear Regression with Scikit-Learn

Step 1: Installation

Step 2: The Code

Step 3: The Result

Key Takeaways

Similar Posts

Leave a Reply Cancel reply