Building a Speech Emotion Recognition System

Aug 19, 2023 | Data Science

homemayankDocumentsarticle-generation-using-llmresized_images_gitmachine_learningreadme_x4nth055_emotion-recognition-using-speech

Welcome to the fascinating world of speech emotion recognition! In this blog, we will guide you through the process of building and training a Speech Emotion Recognition (SER) system. This tool can recognize human emotions from speech, which is not only a technological marvel but also serves various industries from product recommendations to affective computing. Let’s dive in!

1. Introduction

The Speech Emotion Recognition system is designed to train machine learning and deep learning algorithms to detect emotions in human speech. The emotions recognized include neutral, calm, happy, sad, angry, fear, disgust, pleasant surprise, and boredom.

For a deep dive, check out this tutorial.

2. Requirements

Python 3.6+
Python Packages:
- tensorflow
- librosa==0.6.3
- numpy
- pandas
- soundfile==0.9.0
- wave
- scikit-learn==0.24.2
- tqdm==4.28.1
- matplotlib==2.2.3
- pyaudio==0.2.11
- [ffmpeg](https://ffmpeg.org) (optional) – used to manipulate audio files when necessary.

Install these libraries using the command: pip3 install -r requirements.txt

3. Collecting the Dataset

The repository uses four datasets, stored in the data folder:

RAVDESS
TESS
EMO-DB
Custom Dataset – Add or remove recordings as needed.

4. Feature Extraction

Feature extraction is akin to taking the fingerprints of emotions from speech. Just as fingerprints can uniquely identify a person, features from audio that represent emotions can help the system uniquely recognize what emotion is being expressed. In this repository, we use features from the librosa library, such as:

MFCC (Mel Frequency Cepstral Coefficients)
Chromagram
MEL Spectrogram Frequency
Contrast
Tonnetz (tonal centroid features)

5. Building the Model

The excitement of building models lies in trial and error—much like trying to bake a cake! You have to get the ingredients (features) just right to ensure it rises and tastes good (accurate predictions).

Example 1: Using 3 Emotions

Here’s a simple way to build and train a model for recognizing three emotions (sad, neutral, happy):

from emotion_recognition import EmotionRecognizer
from sklearn.svm import SVC

my_model = SVC()
rec = EmotionRecognizer(model=my_model, emotions=['sad', 'neutral', 'happy'], balance=True, verbose=0)
rec.train()

print('Test score:', rec.test_score())
print('Train score:', rec.train_score())

Determining the Best Model

To find the best model, you can load the pre-trained estimators and check their accuracy:


rec.determine_best_model()
print(rec.model.__class__.__name__, 'is the best')
print('Test score:', rec.test_score())

6. Troubleshooting

If you encounter issues, consider the following:

Ensure all required packages are installed.
Verify the audio file format is correct (16000Hz and mono channel).
Check if ffmpeg is installed and added to your path.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

7. Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Now you have the tools to embark on your journey into Speech Emotion Recognition! Happy coding!

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox