How to Use MiniCPM3-4B for Text Generation

Oct 28, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesQuantFactory_MiniCPM3-4B-GGUF

Welcome to this user-friendly guide on leveraging the powerful MiniCPM3-4B model for text generation! This advanced language model not only surpasses previous iterations but also boasts extensive functionality, including a 32k context window and the ability to handle infinite contexts. So let’s dive right in and explore how to extract all the power this model has to offer!

Getting Started with MiniCPM3-4B

Before we jump into the nitty-gritty, ensure you have installed the necessary libraries. You’ll primarily require the Transformers library for inference. Follow the steps below:

1. Inference with Transformers

To get started with text generation using MiniCPM3-4B, follow the steps below:

python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

path = "openbmb/MiniCPM3-4B"
device = "cuda"
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map=device, trust_remote_code=True)

messages = [{"role": "user", "content": "5"}]
model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(device)
model_outputs = model.generate(
    model_inputs,
    max_new_tokens=1024,
    top_p=0.7,
    temperature=0.7
)

output_token_ids = [model_outputs[i][len(model_inputs[i]):] for i in range(len(model_inputs))]
responses = tokenizer.batch_decode(output_token_ids, skip_special_tokens=True)[0]
print(responses)

Explanation of the Code

Think of this code as assembling a jigsaw puzzle. Each piece is crucial for creating a coherent image—just like the components of the model are necessary for generating meaningful text.

The import statements bring in the necessary tools to work with the MiniCPM model.
path is like a treasure map that guides you to the MiniCPM3-4B treasure.
device indicates where your computational magic happens—like choosing the right workbench.
The tokenizer prepares the input text, transforming it into a format the model can understand.
The model, similar to a master craftsman, generates text based on your instructions and the context provided.

2. Inference with vLLM

If you prefer using vLLM for inference, follow these instructions:

bash
pip install git+https://github.com/OpenBMB/vllm.git@minicpm3

Next, you can import libraries and use the following code snippet:

python
from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

model_name = "openbmb/MiniCPM3-4B"
prompt = [{"role": "user", "content": "5"}]

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
input_text = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True)

llm = LLM(model=model_name, trust_remote_code=True, tensor_parallel_size=1)
sampling_params = SamplingParams(top_p=0.7, temperature=0.7, max_tokens=1024, repetition_penalty=1.02)

outputs = llm.generate(prompts=input_text, sampling_params=sampling_params)
print(outputs[0].outputs[0].text)

Understanding Evaluation Results

Evaluating MiniCPM3-4B is essential to understand how it compares with other models. The table below breaks down some of the evaluations:


| Benchmark          | MiniCPM3-4B |
|---------------------|--------------|
| MMLU                | 66.3         |
| HumanEval+          | 68.9         |
| Average             | 66.3         |

You can see how MiniCPM3-4B holds its ground against other models, making it a strong contender in the language model arena.

Troubleshooting

While working with MiniCPM3-4B, you may face some hiccups. Here are a few common issues and their solutions:

Issue: Installation errors.
Solution: Ensure that your environment meets all dependencies. Run pip install -U pip and try reinstalling the Transformers or vLLM library.
Issue: Memory errors during inference.
Solution: Try using a smaller model or reducing batch sizes to accommodate your GPU’s memory limitations.
Issue: Unexpected model outputs.
Solution: Check your input prompts carefully, as they can influence the output significantly.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

With MiniCPM3-4B, you have an advanced tool at your disposal that enables efficient and powerful text generation. This model’s versatility supports a range of applications, from casual conversation to complex programming advice.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox