How to Get Started with InternVL-Chat-V1-2

Aug 13, 2024 | Educational

Welcome to the world of InternVL-Chat-V1-2! This multimodal large language model (MLLM) is designed to bridge the gap between visual and text data, providing an efficient means of handling image-text tasks. In this guide, we will walk you through the process of running this powerful AI model and help you troubleshoot any issues you may encounter.

Quick Start

To begin, ensure you have the appropriate version of transformers installed. We recommend using transformers==4.37.2 for optimal performance. Here’s how to load and run the model:

Model Loading

  • Using 16-bit (bf16 / fp16):
  • import torch
    from transformers import AutoTokenizer, AutoModel
    
    path = "OpenGVLab/InternVL-Chat-V1-2"
    model = AutoModel.from_pretrained(
        path,
        torch_dtype=torch.bfloat16,
        low_cpu_mem_usage=True,
        trust_remote_code=True
    ).eval().cuda()
  • BNB 8-bit Quantization:
  • import torch
    from transformers import AutoTokenizer, AutoModel
    
    path = "OpenGVLab/InternVL-Chat-V1-2"
    model = AutoModel.from_pretrained(
        path,
        torch_dtype=torch.bfloat16,
        load_in_8bit=True,
        low_cpu_mem_usage=True,
        trust_remote_code=True
    ).eval()
  • Multi-GPU Setup: Ensure your model layers are distributed correctly across the GPUs to avoid inference errors.
  • import math
    import torch
    from transformers import AutoTokenizer, AutoModel
    
    def split_model(model_name):
        device_map = {}
        world_size = torch.cuda.device_count()
        num_layers = 60  # Replace with actual number for your model
        num_layers_per_gpu = math.ceil(num_layers / (world_size - 0.5))
        num_layers_per_gpu = [num_layers_per_gpu] * world_size
        num_layers_per_gpu[0] = math.ceil(num_layers_per_gpu[0] * 0.5)
        layer_cnt = 0
    
        for i, num_layer in enumerate(num_layers_per_gpu):
            for j in range(num_layer):
                device_map[layer_cnt] = i
                layer_cnt += 1
                
        return device_map
    
    path = "OpenGVLab/InternVL-Chat-V1-2"
    device_map = split_model('InternVL-Chat-V1-2')
    model = AutoModel.from_pretrained(
        path,
        torch_dtype=torch.bfloat16,
        low_cpu_mem_usage=True,
        trust_remote_code=True,
        device_map=device_map
    ).eval()

Using the Model for Inference

Now that you’ve loaded the model, here are some examples of how you can interact with it:

Text Conversation

question = "Hello, who are you?"
response, history = model.chat(tokenizer, None, question, generation_config, history=None, return_history=True)
print(f"User: {question}")
print(f"Assistant: {response}")

Image Interaction

To analyze an image, you will need to preprocess it:

from PIL import Image
image = Image.open('.examples/image2.jpg').resize((448, 448))
pixel_values = image_processor(images=image, return_tensors='pt').pixel_values.to(torch.bfloat16).cuda()

question = "Please describe the image shortly."
response = model.chat(tokenizer, pixel_values, question, generation_config)
print(f"User: {question}")
print(f"Assistant: {response}")

Understanding the Code

Let’s think of the code setup as arranging a team of chefs in a large kitchen:

  • Load the Ingredients: You first import necessary tools (libraries) like torch and transformers which are like pots and pans.
  • Setup the Chefs: Each chef (model component) is assigned their workstation (device). You need to arrange them wisely across the kitchen (GPUs) to ensure everyone is working efficiently together.
  • Cook the Dish: When you run the model (i.e., when the chefs start cooking), you give them instructions (questions and images) to produce a meal (responses).

Troubleshooting Tips

If you encounter issues while implementing the model, try the following:

  • Ensure that you are using the recommended version of the transformers library.
  • Double-check the compatibility of your GPU setup while utilizing multiple GPUs.
  • For common errors related to memory usage, try adjusting the low_cpu_mem_usage argument during model loading.
  • For assisted troubleshooting, explore community insights or collaborate with experts by visiting **[fxis.ai](https://fxis.ai)**.

Final Thoughts

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox