
This is a true “2026 Vision” project. We will chain two Hugging Face models together to build a speech translator. This guide will take you through the process of building a Hugging Face Speech Translation system step by step.
The Pipeline:
- Input: An audio file (e.g., Spanish speech).
- Model 1: Whisper (Speech-to-Text) will transcribe the Spanish audio into Spanish text.
- Model 2: A Translation model will translate the Spanish text into English text.
- Output: The final English translation.
Step 1: Installation
pip install transformers torch pip install librosa soundfile sentencepiece
Step 2: The Code
We will load two separate pipelines.
from transformers import pipeline
import librosa
# 1. Load the Speech-to-Text (ASR) pipeline
# We use OpenAI's Whisper model
transcriber = pipeline(
"automatic-speech-recognition",
model="openai/whisper-base"
)
# 2. Load the Translation pipeline
# We'll use a model for Spanish (es) to English (en)
translator = pipeline(
"translation_es_to_en",
model="Helsinki-NLP/opus-mt-es-en"
)
# 3. Load your audio file (You need a Spanish audio file)
audio_file = "my_spanish_speech.wav"
try:
speech_data, sample_rate = librosa.load(audio_file, sr=16000)
except FileNotFoundError:
print(f"Error: '{audio_file}' not found.")
exit()
# --- Run the 2-Step Pipeline ---
# Step 1: Transcribe
print("Step 1: Transcribing audio to text...")
transcription = transcriber(speech_data)
spanish_text = transcription['text']
print(f"Spanish Text: {spanish_text}")
# Step 2: Translate
print("\nStep 2: Translating text to English...")
translation = translator(spanish_text)
english_text = translation[0]['translation_text']
print(f"English Text: {english_text}")You’ve just built an AI that can listen to one language and output another!
Key Takeaways
- The project aims to create a Hugging Face Speech Translation system by chaining two models together.
- First, the Whisper model transcribes Spanish audio into Spanish text.
- Next, a translation model converts the Spanish text into English text.
- The process results in English output from Spanish input, demonstrating the workflow of audio translation.
- The article includes steps for installation and code implementation to set up the system.





