Merging Pre-Trained Language Models Using Mergekit

Jan 29, 2024 | Educational

In the world of artificial intelligence, merging pre-trained models can result in more powerful and nuanced AI systems. This guide will walk you through the process of merging language models using mergekit, detailing each step in a user-friendly manner.

Understanding the Merge Process

Imagine you are a chef creating a new dish by blending several existing recipes. Each recipe has its unique flavor, but when combined, they can create something extraordinary. Similarly, we will merge different language models to harness their strengths, resulting in a more capable AI assistant.

Merge Details

This merging was accomplished using the DARE merge method with the following models:

Configuration Overview

The following YAML configuration was utilized to create this model:

yaml
base_model:  
  model:    
    path: NickyNickyTinyDolphin-2.8-1.1b_oasst2_chatML_Cluster_1_V1
dtype: bfloat16
merge_method: dare_ties
slices:
- sources:  
  - layer_range: [0, 22]    
    model:      
      model:        
        path: NickyNickyTinyDolphin-2.8-1.1b_oasst2_chatML_Cluster_1_V1
  - layer_range: [0, 22]    
    model:      
      model:        
        path: NickyNickyTinyDolphin-2.8-1.1b_oasst2_chatML_Cluster_1_V1    
    parameters:      
      density: 0.55      
      weight: 0.55  
  - layer_range: [0, 22]    
    model:      
      model:        
        path: NickyNickyTinyDolphin-2.8-1.1b_oasst2_chatML_Cluster_2_V1    
    parameters:      
      density: 0.55      
      weight: 0.56  
  - layer_range: [0, 22]    
    model:      
      model:        
        path: NickyNickyTinyDolphin-2.8-1.1b_oasst2_chatML_Cluster_3_V1    
    parameters:      
      density: 0.55      
      weight: 0.56  
  - layer_range: [0, 22]    
    model:      
      model:        
        path: cognitivecomputationsTinyDolphin-2.8-1.1b    
    parameters:      
      density: 0.55      
      weight: 0.56

Implementing the Merge in Python

The merging process involves using specific Python libraries, much like assembling ingredients for your favorite dish. The following code implements the merge:

Python
from transformers import (    
    AutoModelForCausalLM,    
    AutoTokenizer,    
    BitsAndBytesConfig,    
    HfArgumentParser,    
    TrainingArguments,    
    pipeline,    
    logging,    
    GenerationConfig,    
    TextIteratorStreamer,
)
import torch

new_model = 'NickyNickyTinyDolphin-2.8-1.1b_oasst2_chatML_all_Cluster_merge_v1'

model = AutoModelForCausalLM.from_pretrained(
    new_model,
    device_map='auto',
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
)

tokenizer = AutoTokenizer.from_pretrained(
    new_model,
    max_length=2048,
    trust_remote_code=True,
    use_fast=True,
)
tokenizer.pad_token = tokenizer.eos_token

prompt = 'im_start system You are a helpful AI assistant. im_end'
inputs = tokenizer.encode(prompt, return_tensors='pt', add_special_tokens=False).cuda()

generation_config = GenerationConfig(
    max_new_tokens=700,
    do_sample=True,
    pad_token_id=tokenizer.eos_token_id,
    eos_token_id=tokenizer.eos_token_id,
)

outputs = model.generate(
    generation_config=generation_config,
    input_ids=inputs,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=False))

Troubleshooting

Should you encounter issues, here are a few tips to help you resolve them:

Ensure all libraries are installed and up to date.
Check that your model paths are correctly specified and accessible.
Pay attention to the device settings; ensure that you have the appropriate CPU or GPU resources available.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

By merging these pre-trained language models using mergekit, you can create a more robust AI assistant capable of performing complex tasks more efficiently. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox