Getting Started with OpenMoE: Your Guide to Open-Source Mixture-of-Experts Language Models

Jan 23, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_5_164

Welcome to the world of OpenMoE! This impressive project is at the forefront of open-source large language model development, focusing on the Mixture-of-Experts (MoE) architecture. In this guide, we’ll walk you through how to get started with OpenMoE, including installation, inference, and troubleshooting.

What is OpenMoE?

Launched in the summer of 2023, OpenMoE aims to create a vibrant community around open-source Mixture-of-Experts large language models. Following the successful release of intermediate checkpoints and models, the team is committed to sharing their innovations and encouraging collaborative contributions.

Releasing Models

OpenMoE-base: A smaller MoE model for debugging.
OpenMoE-8B: 8B MoE trained on 1.1 trillion tokens.
OpenMoE-8B-Chat: Efficiently supervised fine-tuned version on the WildChat GPT-4 subset.
OpenMoE-34B: A more extensive model, currently ongoing with new training efforts.

Installing OpenMoE

To get started with OpenMoE, follow these easy steps for installation:

Ensure you are using Python version 3.10.12.
Clone the forked version of ColossalAI:

git clone --branch my_openmoe https://github.com/Orion-Zheng/ColossalAI.git

Navigate to the directory:

cd ColossalAI

Install ColossalAI:

pip install .

Install requirements:

python -m pip install -r examples/language/openmoe/requirements.txt

Inference with OpenMoE

Now that OpenMoE is set up, you can perform inference using the following sample code. Think of the model as a highly skilled chef preparing a unique dish based on ingredients (data). The more specific and refined the ingredients (input prompts), the better the meal (response) will be. Here’s how to invoke your chef:

from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM

model_path = "ckpts/openmoe-8b-chat"
config = AutoConfig.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto"
)

query = "Question: How do I kill a process? Answer:"
prompt = f"SYS You are a helpful, respectful and honest assistant. SYS s[INST] {query} [INST]"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
sample = model.generate(**inputs, max_new_tokens=32)
print(tokenizer.decode(sample[0]))

Troubleshooting

If you encounter issues while working with OpenMoE, consider the following troubleshooting ideas:

Ensure your GPU settings are correct; check memory requirements. For instance, OpenMoE-8B requires around 49GB of memory in float32 or 23GB in bfloat16.
Try using a higher RAM setting in Google Colab or check if you have Colab Pro if you’re experiencing memory issues.
If you’re unable to run the model locally, you can explore it on [Colab](https://colab.research.google.com/drive/1xIfIVafnlCP2XVICmRwkUFK3cwTJYjCY#scrollTo=62T-2mH_tsjG) directly.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox