Welcome to the world of OpenMoE! This impressive project is at the forefront of open-source large language model development, focusing on the Mixture-of-Experts (MoE) architecture. In this guide, we’ll walk you through how to get started with OpenMoE, including installation, inference, and troubleshooting.
What is OpenMoE?
Launched in the summer of 2023, OpenMoE aims to create a vibrant community around open-source Mixture-of-Experts large language models. Following the successful release of intermediate checkpoints and models, the team is committed to sharing their innovations and encouraging collaborative contributions.
Releasing Models
- OpenMoE-base: A smaller MoE model for debugging.
- OpenMoE-8B: 8B MoE trained on 1.1 trillion tokens.
- OpenMoE-8B-Chat: Efficiently supervised fine-tuned version on the WildChat GPT-4 subset.
- OpenMoE-34B: A more extensive model, currently ongoing with new training efforts.
Installing OpenMoE
To get started with OpenMoE, follow these easy steps for installation:
- Ensure you are using Python version 3.10.12.
- Clone the forked version of ColossalAI:
- Navigate to the directory:
- Install ColossalAI:
- Install requirements:
git clone --branch my_openmoe https://github.com/Orion-Zheng/ColossalAI.git
cd ColossalAI
pip install .
python -m pip install -r examples/language/openmoe/requirements.txt
Inference with OpenMoE
Now that OpenMoE is set up, you can perform inference using the following sample code. Think of the model as a highly skilled chef preparing a unique dish based on ingredients (data). The more specific and refined the ingredients (input prompts), the better the meal (response) will be. Here’s how to invoke your chef:
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM
model_path = "ckpts/openmoe-8b-chat"
config = AutoConfig.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto"
)
query = "Question: How do I kill a process? Answer:"
prompt = f"SYS You are a helpful, respectful and honest assistant. SYS s[INST] {query} [INST]"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
sample = model.generate(**inputs, max_new_tokens=32)
print(tokenizer.decode(sample[0]))
Troubleshooting
If you encounter issues while working with OpenMoE, consider the following troubleshooting ideas:
- Ensure your GPU settings are correct; check memory requirements. For instance, OpenMoE-8B requires around 49GB of memory in float32 or 23GB in bfloat16.
- Try using a higher RAM setting in Google Colab or check if you have Colab Pro if you’re experiencing memory issues.
- If you’re unable to run the model locally, you can explore it on [Colab](https://colab.research.google.com/drive/1xIfIVafnlCP2XVICmRwkUFK3cwTJYjCY#scrollTo=62T-2mH_tsjG) directly.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

