How to Utilize DeepSeek-V2.5 for Model Inference

Oct 28, 2024 | Educational

Welcome to the world of DeepSeek-V2.5! This guide will walk you through how to run this sophisticated AI model both locally and with different frameworks. Think of DeepSeek-V2.5 as a master chef with various recipes (models) combining the best cooking techniques from different cuisines (DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct). Let’s dive in!

1. Introduction

DeepSeek-V2.5 is an upgraded version that merges the capabilities of its predecessors into one powerful model. With optimized metrics, it efficiently handles a variety of tasks, just like a chef who has mastered culinary arts takes on different cuisines seamlessly.

2. How to Run Locally

To run DeepSeek-V2.5 locally, you’ll need:

  • 80GB of memory
  • 8 GPUs

Inference with Huggingfaces Transformers

You can utilize Huggingfaces Transformers for model inference. Below is a simple recipe to get started:

python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

model_name = "deepseek-ai/DeepSeek-V2.5"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

# Memory allocation
max_memory = {i: '75GB' for i in range(8)}

# Load model
model = AutoModelForCausalLM.from_pretrained(model_name, 
                                             trust_remote_code=True, 
                                             device_map="sequential", 
                                             torch_dtype=torch.bfloat16, 
                                             max_memory=max_memory)

# Configure generation
model.generation_config = GenerationConfig.from_pretrained(model_name)
model.generation_config.pad_token_id = model.generation_config.eos_token_id

messages = [
    {"role": "user", "content": "Write a piece of quicksort code in C++"}
]

# Tokenize input
input_tensor = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(input_tensor.to(model.device), max_new_tokens=100)
result = tokenizer.decode(outputs[0][input_tensor.shape[1]:], skip_special_tokens=True)

print(result)

This code works like a recipe where each line contributes a vital ingredient to create a delicious dish (output). Loading the model is akin to gathering all your ingredients and setting the stage for cooking.

Inference with vLLM (Recommended)

For more optimized execution, consider utilizing vLLM.

python
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

max_model_len, tp_size = 8192, 8
model_name = "deepseek-ai/DeepSeek-V2.5"

tokenizer = AutoTokenizer.from_pretrained(model_name)
llm = LLM(model=model_name, tensor_parallel_size=tp_size, max_model_len=max_model_len, trust_remote_code=True, enforce_eager=True)

sampling_params = SamplingParams(temperature=0.3, max_tokens=256, stop_token_ids=[tokenizer.eos_token_id])

messages_list = [
    [{"role": "user", "content": "Who are you?"}],
    [{"role": "user", "content": "Translate the following content into Chinese directly: DeepSeek-V2 adopts innovative architectures to guarantee economical training and efficient inference."}],
    [{"role": "user", "content": "Write a piece of quicksort code in C++."}],
]

prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list]
outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)

generated_text = [output.outputs[0].text for output in outputs]
print(generated_text)

This setup processes each request to the model like drafting multiple letters to a friend. Each letter (message) is distinct but stays on the same topic, allowing for a fluid interaction.

3. Troubleshooting

If you encounter issues during setup or execution:

  • Ensure all dependencies and libraries are correctly installed.
  • Check your memory allocation and GPU availability to meet the requirements.
  • If the model doesn’t respond as expected, verify input formats and parameters specified in your code.

For any persistent issues, please reach out to support or visit our community. For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

4. License

The code repository is licensed under the MIT License, ensuring the use of DeepSeek-V2 models is accessible for commercial applications.

5. Conclusion

Now you’re equipped to run and explore the capabilities of DeepSeek-V2.5 with confidence! Dive into the culinary adventure of AI, crafting intelligent solutions with every line of code.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox