Are you excited to dive into the world of Automatic Speech Recognition (ASR) using a deep neural network? This guide will walk you through the process, making it simple and user-friendly, while ensuring you can troubleshoot any issues you encounter along the way!
Project Overview
In this project, we will build a deep neural network to convert raw audio into transcriptions. Using the LibriSpeech dataset, we’ll learn how to generate model architectures that translate audio features into text. By the end, you’ll have the skills to create and test state-of-the-art models!
Getting Started
Let’s start our journey towards creating an ASR model by following these steps:
-
Clone the Repository
Begin by cloning the repository and navigating to the downloaded folder:
git clone https://github.com/udacity/AIND-VUI-Capstone.git cd AIND-VUI-Capstone -
Create a New Environment
Next, create and activate a new environment with Python 3.6 and the numpy package:
- Linux or Mac:
conda create --name aind-vui python=3.5 numpysource activate aind-vui - Windows:
conda create --name aind-vui python=3.5 numpy scipyactivate aind-vui
- Linux or Mac:
-
Install TensorFlow
Now you need to install TensorFlow:
- Option 1 (GPU Support): Follow this guide. For Udacity AMI, install only the package:
pip install tensorflow-gpu==1.1.0 - Option 2 (CPU Support):
pip install tensorflow==1.1.0
- Option 1 (GPU Support): Follow this guide. For Udacity AMI, install only the package:
-
Install Required Packages
Run the following command to install needed pip packages:
pip install -r requirements.txt -
Switch Keras Backend to TensorFlow
Now, switch Keras backend to TensorFlow:
- Linux or Mac:
KERAS_BACKEND=tensorflow python -c "from keras import backend" - Windows:
set KERAS_BACKEND=tensorflow python -c "from keras import backend"
- Linux or Mac:
-
Obtain the Libav Package
Depending on your OS, the installation process will vary:
- Linux:
sudo apt-get install libav-tools - Mac:
brew install libav - Windows: Visit the Libav website and follow the download instructions.
- Linux:
-
Download and Prepare the Dataset
Get the LibriSpeech dataset and convert audio files:
- Linux or Mac:
wget http://www.openslr.org/resources/12/dev-clean.tar.gztar -xzvf dev-clean.tar.gzwget http://www.openslr.org/resources/12/test-clean.tar.gztar -xzvf test-clean.tar.gzmv flac_to_wav.sh LibriSpeechcd LibriSpeech.flac_to_wav.sh- Windows: Download the two files from the browser, extract using an applicable application, and convert files in your terminal.
-
Create JSON Files
Create JSON files for the train and validation datasets:
cd .. python create_desc_json.py LibriSpeech/dev-clean train_corpus.json python create_desc_json.py LibriSpeech/test-clean valid_corpus.json -
Create an IPython Kernel
Create a kernel for your environment:
python -m ipykernel install --user --name aind-vui --display-name aind-vui -
Run the Notebook
Launch the notebook and set the kernel to match the
aind-vuienvironment through the drop-down menu!
Working with the ASR Model
While some code is pre-implemented, you will need to add functionality to answer key questions inside the notebook. Avoid modifying any existing code unless specified!
AWS for GPU Access
If you don’t have access to a local GPU, consider using Amazon Web Services to launch an EC2 GPU instance.
Evaluation and Submission
Before submitting, ensure you meet all criteria set forth in the project rubric:
- Submit all necessary files, including the notebook and model architectures.
- Follow the completion guidelines for each model and analyze their performance differences thoroughly.
Troubleshooting Tips
- If you run into dependency issues, ensure that all required packages are correctly installed in the environment.
- Verify file paths when downloading and converting the dataset.
- Check for compatibility if using different versions of software.
- For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.
Final Thoughts
By following this guide, you will not only gain hands-on experience in building an ASR system but also develop crucial skills in neural network design. Keep pushing those boundaries, and happy coding!
