How to Run Megatron BERT Using Transformers

Sep 12, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_17_1051

Are you ready to harness the power of large transformer models? Today, we will journey through how to run the Megatron BERT model, developed by the brilliant minds at NVIDIA. This model, packed with 345 million parameters, is efficient for various NLP tasks. Let’s dive in!

Prerequisites

Before we start, ensure you have the following:

A system with a compatible environment set up for running Python and PyTorch.
A terminal or command line interface to execute commands.
Access to clone the Transformers repository.

In this guide, we’ll run all commands from a folder called $MYDIR, which we define as:

export MYDIR=$HOME

Feel free to change the location at your convenience.

Step 1: Clone the Transformers Repository

First, we need to clone the Transformers repository. You can do this with the following command:

git clone https://github.com/huggingface/transformers.git $MYDIR/transformers

Step 2: Get the Checkpoint from NVIDIA GPU Cloud

Next, create a directory to store the Megatron BERT model:

mkdir -p $MYDIR/nvidia_megatron-bert-cased-345m

To download the model checkpoint, you need to sign up for the NVIDIA GPU Cloud (NGC) and set up the NGC Registry CLI. For further documentation, you can refer to the NGC documentation.

Alternatively, you can directly download the checkpoint using:

wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia:megatron_bert_345m/versions/v0.1_cased.zip -O $MYDIR/nvidia_megatron-bert-cased-345m/checkpoint.zip

Step 3: Converting the Checkpoint

Now, we need to convert the checkpoint to load it into Transformers. Run the following command:

python3 $MYDIR/transformers/src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py $MYDIR/nvidia_megatron-bert-cased-345m/checkpoint.zip

Step 4: Using the Model

Now you can use the Megatron BERT for tasks like Masked LM and Next Sentence Prediction. Below are some code snippets explaining how to utilize the model for each task.

Masked LM

Masked Language Modeling (Masked LM) allows the model to predict missing words in a sentence, like a mental fill-in-the-blank exercise. Here’s how you do it:

import os
import torch
from transformers import BertTokenizer, MegatronBertForMaskedLM

tokenizer = BertTokenizer.from_pretrained('nvidia_megatron-bert-cased-345m')
directory = os.path.join(os.environ['MYDIR'], 'nvidia_megatron-bert-cased-345m')
model = MegatronBertForMaskedLM.from_pretrained(directory)

device = torch.device('cuda')
model.to(device)
model.eval()
model.half()

input = tokenizer("The capital of France is [MASK].", return_tensors='pt').to(device)
label = tokenizer("The capital of France is Paris.", return_tensors='pt')['input_ids'].to(device)

with torch.no_grad():
    output = model(**input, labels=label)
    print(output)

Next Sentence Prediction

Next Sentence Prediction checks how well the model understands the sequence of sentences. Here’s how that’s done:

import os
import torch
from transformers import BertTokenizer, MegatronBertForNextSentencePrediction

tokenizer = BertTokenizer.from_pretrained('nvidia_megatron-bert-cased-345m')
directory = os.path.join(os.environ['MYDIR'], 'nvidia_megatron-bert-cased-345m')
model = MegatronBertForNextSentencePrediction.from_pretrained(directory)

device = torch.device('cuda')
model.to(device)
model.eval()
model.half()

input = tokenizer("In Italy, pizza served in formal settings is presented unsliced. The sky is blue due to the shorter wavelength of blue light.", return_tensors='pt').to(device)
label = torch.LongTensor([1]).to(device)

with torch.no_grad():
    output = model(**input, labels=label)
    print(output)

Troubleshooting

If you face any issues during the installation or execution of the model, here are some troubleshooting tips:

If you encounter a ModuleNotFoundError: No module named megatron.model.enums error when converting the checkpoint, ensure you’ve cloned the Megatron-LM repository. You can do this using:

cd tmp
git clone https://github.com/NVIDIA/Megatron-LM

Then set the PYTHONPATH accordingly:

PYTHONPATH=tmp/Megatron-LM python src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py ...

If you have cloned a different version, ensure that the path points to your existing clone of Megatron.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

Congratulations! You’re now equipped with the knowledge to run the Megatron BERT model using the Transformers library. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox