|

AI Project: Build a Speech-to-Speech Translator (Hugging Face)

3D isometric illustration of a machine converting blue sound waves into red sound waves, representing an AI speech-to-speech translator it's Hugging Face Speech Translation.

This is a true “2026 Vision” project. We will chain two Hugging Face models together to build a speech translator. This guide will take you through the process of building a Hugging Face Speech Translation system step by step.

The Pipeline:

  1. Input: An audio file (e.g., Spanish speech).
  2. Model 1: Whisper (Speech-to-Text) will transcribe the Spanish audio into Spanish text.
  3. Model 2: A Translation model will translate the Spanish text into English text.
  4. Output: The final English translation.

Step 1: Installation

pip install transformers torch
pip install librosa soundfile sentencepiece

Step 2: The Code

We will load two separate pipelines.

from transformers import pipeline
import librosa

# 1. Load the Speech-to-Text (ASR) pipeline
# We use OpenAI's Whisper model
transcriber = pipeline(
    "automatic-speech-recognition",
    model="openai/whisper-base"
)

# 2. Load the Translation pipeline
# We'll use a model for Spanish (es) to English (en)
translator = pipeline(
    "translation_es_to_en",
    model="Helsinki-NLP/opus-mt-es-en"
)

# 3. Load your audio file (You need a Spanish audio file)
audio_file = "my_spanish_speech.wav"
try:
    speech_data, sample_rate = librosa.load(audio_file, sr=16000)
except FileNotFoundError:
    print(f"Error: '{audio_file}' not found.")
    exit()

# --- Run the 2-Step Pipeline ---

# Step 1: Transcribe
print("Step 1: Transcribing audio to text...")
transcription = transcriber(speech_data)
spanish_text = transcription['text']
print(f"Spanish Text: {spanish_text}")

# Step 2: Translate
print("\nStep 2: Translating text to English...")
translation = translator(spanish_text)
english_text = translation[0]['translation_text']
print(f"English Text: {english_text}")

You’ve just built an AI that can listen to one language and output another!


Key Takeaways

  • The project aims to create a Hugging Face Speech Translation system by chaining two models together.
  • First, the Whisper model transcribes Spanish audio into Spanish text.
  • Next, a translation model converts the Spanish text into English text.
  • The process results in English output from Spanish input, demonstrating the workflow of audio translation.
  • The article includes steps for installation and code implementation to set up the system.

Similar Posts

Leave a Reply