How to Use NVLM 1.0: Your Guide to Multimodal Large Language Models

Oct 28, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesnvidia_NVLM-D-72B

Are you ready to dive into the vast ocean of NVLM 1.0, an innovative multimodal large language model? This article will guide you through the process of utilizing NVLM 1.0 to perform vision-language tasks, comparable to leading models like GPT-4o and Llama 3. With the seamless integration of both image and text inputs, NVLM 1.0 is set to revolutionize how we interact with AI. So grab your digital diving gear, and let’s explore this new frontier!

Getting Started with NVLM 1.0

Model Overview: NVLM 1.0 offers excellent performance on various benchmarks, enhancing both multimodal and text-only tasks.
Access the NVLM 1.0 Paper for a deeper understanding of its capabilities.
Support your explorations by checking out the official NVLM website for more resources.

Preparing Your Environment

Before you begin using NVLM 1.0, you need to set up your environment correctly. Here’s how:

docker build -t nvidia/nvlm:1.0 .

Ensure your Docker image is based on nvcr.io/nvidia/pytorch:23.09-py3. This setup will help you avoid any discrepancies in benchmark results.

Loading the Model

Loading NVLM is like turning the lights on in a vast library; it opens up a world of knowledge at your fingertips. Here’s how to load the model:

import torch
from transformers import AutoModel

# Specify your model path
path = "nvidia/NVLM-D-72B"
model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    use_flash_attn=False,
    trust_remote_code=True).eval()

The example above utilizes torch to efficiently load the NVLM model onto your machine, preparing it for inference.

Using Multiple GPUs

Think of loading NVLM on multiple GPUs as teamwork in a relay race. Each runner handles a section of the track, making the entire event faster and more efficient. Here’s a code snippet to implement this:

import torch
import math
from transformers import AutoModel

def split_model():
    device_map = {}
    world_size = torch.cuda.device_count() 
    num_layers = 80 

    num_layers_per_gpu = math.ceil(num_layers / (world_size - 0.5))
    num_layers_per_gpu = [num_layers_per_gpu] * world_size
    num_layers_per_gpu[0] = math.ceil(num_layers_per_gpu[0] * 0.5)
    
    layer_cnt = 0
    for i, num_layer in enumerate(num_layers_per_gpu):
        for j in range(num_layer):
            device_map[flanguage_model.model.layers[layer_cnt]] = i
            layer_cnt += 1
    
    # Assign remaining components to the first GPU
    for component in [vision_model, mlp1, language_model.model.tok_embeddings, 
                      language_model.model.embed_tokens, language_model.model.output,
                      language_model.model.norm, language_model.lm_head]:
        device_map[component] = 0
    
    device_map[flanguage_model.model.layers.num_layers - 1] = 0
    return device_map

path = "nvidia/NVLM-D-72B"
device_map = split_model()
model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    use_flash_attn=False,
    trust_remote_code=True,
    device_map=device_map).eval()

Inference in Action

Now that you have your model loaded, it’s time to put it to work! Consider it like asking a wise friend a question: it processes information and gives you insightful answers. Here’s how to conduct inference:

import torch
from transformers import AutoTokenizer

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False)

# Prepare to chat with the model
question = "Hello, who are you?"
response, history = model.chat(tokenizer, None, question, generation_config, history=None, return_history=True)
print(f"User: {question}\nAssistant: {response}")

Troubleshooting Tips

Should you encounter any bumps on your journey while using NVLM 1.0, consider these troubleshooting ideas:

Verify that your Docker environment is set up according to the latest specifications using the provided Dockerfile.
If you notice discrepancies in benchmark results, check for variations in your transformer and CUDA versions.
Ensure your model path and any image files are correctly defined to prevent file not found errors.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Ready to start your multimodal journey with NVLM 1.0? The future of AI is just a code snippet away!

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox