In this blog post, we’ll take a closer look at the Maral 7B Alpha model, a large language model (LLM) specifically tailored for the Persian language. It not only opens new avenues for Persian language processing but also provides exceptional performance in English.
What is Maral?
Maral is based on the Mistral architecture and is trained on the Alpaca Persian dataset. The name “Maral” refers to the Red Deer native to Iran, which symbolizes the environmental concerns and the cultural significance intertwined with this Persian LLM.
Setting Up Your Environment
To get started with Maral, you need to install a few libraries. Open your terminal or command prompt and run the following command:
pip install transformers accelerate bitsandbytes
Note: The bitsandbytes library is only required for the 8-bit version. If you’re using other configurations, you can skip installing this library.
Inference with Maral
When you want to interact with the Maral model, you need to follow a specific prompt format. This is akin to how you might approach a conversation by clearly stating your question and waiting for a response. Here’s the format to adhere to:
### Human: prompt
### Assistant: answer
Example Code for Inference on Large GPUs
If you’re equipped with a powerful GPU, such as the A100, use the following code snippet:
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
import torch
model_name_or_id = "MaralGPT-Maral-7B-alpha-1"
model = AutoModelForCausalLM.from_pretrained(model_name_or_id, torch_dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name_or_id)
prompt = "در سال ۱۹۹۶ چه کسی رییس جمهور آمریکا بود؟"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
generation_config = GenerationConfig(
do_sample=True,
top_k=1,
temperature=0.5,
max_new_tokens=300,
pad_token_id=tokenizer.eos_token_id
)
outputs = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Analogy for Clarity
Imagine you have a library filled with thousands of books (the data) and a librarian (the model) who can give you information about any book you ask for. The prompt you provide is like asking the librarian a precise question. If you ask clearly, you receive an accurate answer, just like how you should construct your query carefully to get the best output from Maral.
Inference on Small GPUs
If your hardware is less powerful, you can still use the model on consumer-grade GPUs or even the free version of Google Colab with some adjustments. Just ensure bitsandbytes is installed correctly and load the model as follows:
model = AutoModelForCausalLM.from_pretrained(model_name_or_id, load_in_8bit=True, torch_dtype=torch.bfloat16, device_map="auto")
In case you encounter RAM issues on the free version of Google Colab, try using the low_cpu_mem_usage=True option when loading the model.
Troubleshooting
While using Maral, you can encounter a few issues:
- The model may produce text with GPT-3.5 level grammar but can occasionally generate hallucinations. Improving the dataset could help.
- When handling reasoning tasks in Persian, the model may provide misleading answers.
- Since the model is resource-intensive, it requires substantial computational power. Exploring alternatives like GPTQ or GGUF versions could be beneficial.
- If you notice repetitive outputs, adjusting the temperature below 1, ideally between 0.5 and 0.7, may solve the problem.
For further assistance, remember to connect with a community or experts who can provide insights and solutions. For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

