
We’ve used Whisper to transcribe speech, but what if you just want to know what a sound is? Hugging Face Audio Classification is a powerful tool if you want to identify sounds such as “dog barking” or “car horn”.
This is Audio Classification. We can use a model trained on general-purpose sound, and the Hugging Face pipeline makes it easy.
Step 1: Installation
You’ll need librosa to load audio files.
pip install transformers torch pip install librosa
Step 2: The Code
We will use the audio-classification pipeline.
from transformers import pipeline
import librosa
# 1. Load the pipeline
# 'superb/hubert-large-superb-er' is a popular model for
# "Emotion Recognition," but it's built on a general audio classifier.
# Or, 'MIT/ast-finetuned-audioset-10-10-0.4593' for general sounds.
classifier = pipeline("audio-classification", model="MIT/ast-finetuned-audioset-10-10-0.4593")
# 2. Load your audio file
# (You'll need your own .wav or .mp3 file of a sound)
audio_file = "my_dog_barking.wav"
try:
sound_data, sample_rate = librosa.load(audio_file, sr=16000)
except FileNotFoundError:
print(f"Error: '{audio_file}' not found. Please provide an audio file.")
exit()
# 3. Classify!
# The model will return the top 5 most likely sounds
results = classifier(sound_data)
# 4. Print the results
print(f"--- Top 5 Sound Guesses for '{audio_file}' ---")
for result in results:
print(f"Label: {result['label']} | Score: {result['score']:.4f}")Step 3: The Result
If your audio file was a dog barking, the output would be:
--- Top 5 Sound Guesses for 'my_dog_barking.wav' --- Label: Dog Score: 0.9812 Label: Bark Score: 0.9750 Label: Domestic animals, pets Score: 0.8800 ...
This is the core technology for identifying sounds for security, monitoring, or accessibility applications.
Key Takeaways
- The article introduces Hugging Face Audio Classification, a method to identify sounds such as ‘dog barking’ or ‘car horn’.
- It explains the installation of the necessary library, librosa, for loading audio files.
- The use of the audio-classification pipeline simplifies processing audio for classification tasks.
- The technology serves various applications including security, monitoring, and accessibility.
- Overall, it demonstrates how to use Hugging Face for audio analysis effectively.





