|

AI Project: Audio Classification with Hugging Face (Environmental Sounds)

3D isometric illustration of a robot in a forest and city environment classifying sound waves into icons, representing AI audio classification.

We’ve used Whisper to transcribe speech, but what if you just want to know what a sound is? Hugging Face Audio Classification is a powerful tool if you want to identify sounds such as “dog barking” or “car horn”.

This is Audio Classification. We can use a model trained on general-purpose sound, and the Hugging Face pipeline makes it easy.

Step 1: Installation

You’ll need librosa to load audio files.

pip install transformers torch
pip install librosa

Step 2: The Code

We will use the audio-classification pipeline.

from transformers import pipeline
import librosa

# 1. Load the pipeline
# 'superb/hubert-large-superb-er' is a popular model for
# "Emotion Recognition," but it's built on a general audio classifier.
# Or, 'MIT/ast-finetuned-audioset-10-10-0.4593' for general sounds.
classifier = pipeline("audio-classification", model="MIT/ast-finetuned-audioset-10-10-0.4593")

# 2. Load your audio file
# (You'll need your own .wav or .mp3 file of a sound)
audio_file = "my_dog_barking.wav"
try:
    sound_data, sample_rate = librosa.load(audio_file, sr=16000)
except FileNotFoundError:
    print(f"Error: '{audio_file}' not found. Please provide an audio file.")
    exit()

# 3. Classify!
# The model will return the top 5 most likely sounds
results = classifier(sound_data)

# 4. Print the results
print(f"--- Top 5 Sound Guesses for '{audio_file}' ---")
for result in results:
    print(f"Label: {result['label']} | Score: {result['score']:.4f}")

Step 3: The Result

If your audio file was a dog barking, the output would be:

--- Top 5 Sound Guesses for 'my_dog_barking.wav' ---
Label: Dog
Score: 0.9812
Label: Bark
Score: 0.9750
Label: Domestic animals, pets
Score: 0.8800
...

This is the core technology for identifying sounds for security, monitoring, or accessibility applications.


Key Takeaways

  • The article introduces Hugging Face Audio Classification, a method to identify sounds such as ‘dog barking’ or ‘car horn’.
  • It explains the installation of the necessary library, librosa, for loading audio files.
  • The use of the audio-classification pipeline simplifies processing audio for classification tasks.
  • The technology serves various applications including security, monitoring, and accessibility.
  • Overall, it demonstrates how to use Hugging Face for audio analysis effectively.

Similar Posts

Leave a Reply