How to Build RNNs and LSTMs from Scratch with NumPy

Jul 29, 2022 | Data Science

homemayankDocumentsarticle-generation-using-llmresized_images_gitnatural_language_processingreadme_nicklashansen_rnn_lstm_from_scratch

Welcome to this guide on building Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks from scratch using NumPy! This tutorial is designed to be user-friendly and provides a hands-on approach to understanding how these powerful models work, especially in the context of sequential data.

Introduction to RNNs and LSTMs

In the world of artificial intelligence, RNNs and LSTMs are like the dedicated librarians of a library, capable of remembering the context of the books (data) they read in order to provide a coherent narrative (output). In this case, our library is a sequence of words or tokens, where our goal is to predict the next token in a sequence.

Here, we will be using NumPy to understand the underlying mechanics before transitioning to a more powerful framework, PyTorch.

The Dataset

For our exercise, we will create a simple dataset comprising sequences of tokens. Each sequence will be structured as follows:

a b EOS
a a b b EOS
a a a a a b b b b b EOS

In this context, EOS represents the end of the sequence. The challenge is to predict the next token, where the network must learn the pattern that, for example, five ‘b’s follow five ‘a’s.

Steps to Build RNNs and LSTMs

Once you have set up your environment and created your dataset, follow these steps to build your RNN and LSTM:

1. Represent Categorical Variables


# Sample encoding
import numpy as np

def encode(sequence):
    mapping = {'a': 0, 'b': 1, 'EOS': 2}
    return np.array([mapping[token] for token in sequence])

2. Build a Recurrent Neural Network (RNN) from Scratch

An RNN can be thought of as a series of interconnected neurons that remember inputs based on their cyclic structure, akin to a relay race where each runner passes the baton (information) to the next.


class SimpleRNN:
    def __init__(self, input_size, hidden_size):
        self.hidden_size = hidden_size
        self.Wxh = np.random.randn(hidden_size, input_size)  # Input to hidden
        self.Whh = np.random.randn(hidden_size, hidden_size) # Hidden to hidden
        self.bh = np.zeros((hidden_size, 1))

    def forward(self, inputs, h_prev):
        h = np.tanh(np.dot(self.Wxh, inputs) + np.dot(self.Whh, h_prev) + self.bh)
        return h

3. Build an LSTM Network from Scratch

Just as a seasoned chef employs a variety of spices to create a memorable dish, an LSTM uses gates to manage the information it retains, allowing it to learn longer sequences without losing context.


class LSTM:
    def __init__(self, input_size, hidden_size):
        self.input_size = input_size
        self.hidden_size = hidden_size
        # Initialize weights...
    
    def forward(self, input, h_prev, c_prev):
        # Updated cell and hidden state equations...
        return h, c

4. Implement the LSTM Network in PyTorch

After mastering the building blocks, transitioning to PyTorch is like moving from a bicycle to a motorbike; it’s faster and offers more features for scaling your projects.


import torch
import torch.nn as nn

class PyTorchLSTM(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(PyTorchLSTM, self).__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
    
    def forward(self, x):
        out, (h, c) = self.lstm(x)
        return out

Results Observed

The learning process can be visualized through graphs depicting loss over time. Here’s a summary of results achieved:

The RNN requires considerable effort to converge.
The LSTM demonstrates a significantly faster learning curve.
The PyTorch implementation of LSTM learns even faster and tends to reach a better local minimum.

Troubleshooting Ideas

As with any journey into coding, you may encounter some bumps along the way. Here are a few troubleshooting tips:

Issue: Slow training convergence. You might want to adjust your learning rate or tune your model architecture to improve training times.
Issue: Model overfitting. Consider introducing regularization techniques such as dropout or adjusting your dataset size.
Issue: Incorrect predictions. Double-check your encoding and sequence management to ensure your model is learning correctly.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

Through this exercise, you should now have a stronger grasp of how RNNs and LSTMs function, as well as the tools necessary to train them effectively. The overarching conclusion? Utilize PyTorch for its enhanced capabilities and faster convergence rates.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox