Are you ready to dive into the world of lightweight language models? Meet Hare-1.1B-base, a remarkable creation from the LiteAI Team born out of a collaboration with China Telecom Guizhou Branch. With its versatile capabilities, this model is designed for easy deployment on consumer-grade hardware. In this guide, we’ll walk you through the key features, usage instructions, and troubleshooting tips for this exceptional model.
Overview of Hare-1.1B-base
Hare-1.1B-base leverages a blend of high-quality open-source data and synthetically generated data for training. Here are some highlights:
- Model Size: 1.1 billion parameters
- Architecture: Based on Mistral
- Supports: Consumer-grade GPUs and mobile devices
- Performance: Demonstrated strong results on the Open LLM Leaderboard
How to Use Hare-1.1B-base
To utilize Hare-1.1B-base effectively, follow the steps below for both Python inference and deployment using vLLM.
Using Python for Inference
The following script will help you generate text using Hare-1.1B-base:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_path = 'LiteAI-Team/Hare-1.1B-base'
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)
model.to(device)
prompt = "Write a poem based on the landscape of Guizhou:"
tokens = tokenizer(prompt, add_special_tokens=True, return_tensors='pt').to(device)
output = model.generate(**tokens, max_new_tokens=128)
output_tokens = output[0].cpu().numpy()[tokens.input_ids.size()[1]:]
output_string = tokenizer.decode(output_tokens)
print(output_string)
Analogy for Understanding the Code
Think of using Hare-1.1B-base like a chef preparing a delightful dish. You first set the stage (import libraries), choose your ingredients (model and tokenizer), and prepare your kitchen (placing the model on the appropriate device). Once everything is ready, you craft your recipe (input prompt) and let the magic happen—out comes a beautifully plated dish (the generated text). Each component plays a crucial role in delivering a delicious result!
Deploying with vLLM
To install and deploy Hare-1.1B-base using vLLM, run the following commands:
pip install vllm
python
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
model_path = 'LiteAI-Team/Hare-1.1B-base'
llm = LLM(model=model_path, trust_remote_code=True, tensor_parallel_size=4)
query = "Write a poem based on the landscape of Guizhou:"
sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=64)
outputs = llm.generate(query, sampling_params)
print(outputs)
Edge Deployment
With its compact size, Hare-1.1B-base can be seamlessly deployed on mobile devices. The model only requires 0.6GB of space after Int4 quantization, making it an ideal choice for real-time applications.
Troubleshooting and Support
While using Hare-1.1B-base, you might encounter some challenges. Here are some common issues and their solutions:
- Model not loading: Ensure that all relevant libraries are properly installed and that your device is set up correctly.
- Out of Memory Error: Try reducing the batch size or running the model on a device with more memory.
- Unexpected Outputs: Remember that the model does not understand context like a human; refine your prompts for better results.
For further insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Final Thoughts
Hare-1.1B-base is a groundbreaking lightweight model poised for various applications. Its architecture and support for edge deployment place it at the forefront of modern AI technology.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.
