How to Use the WizardLM-2-8x22B EXL2 Model

Apr 17, 2024 | Educational

In this guide, we will walk you through how to work with the WizardLM-2-8x22B model, specifically focusing on its EXL2 quantization and perplexity scoring. Whether you are a beginner or an experienced developer, you will find this article user-friendly and illustrative.

What is WizardLM-2-8x22B?

WizardLM-2-8x22B is a sophisticated language model from Microsoft designed to handle complex text generation tasks. The EXL2 version enhances its performance through quantization, allowing it to be both efficient and effective.

Getting Started with EXL2 Version

The quants developed using the EXL2 framework were built with the exllamav2 version 0.0.18. If you are using an older version, you may face compatibility issues. To ensure smooth operation, update your Text Generation WebUI to the latest version.

Perplexity Scoring Explained

Perplexity scoring is an essential metric used to evaluate language models. It indicates how well a probability distribution predicts a sample and lower scores are preferable. Below is a table representing the perplexity scores associated with different quant levels of the EXL2 models:


Quant Level  Perplexity Score
-------------------------------
7.0         4.5859
6.0         4.6252
5.5         4.6493
5.0         4.6937
4.5         4.8029
4.0         4.9372
3.5         5.1336
3.25        5.3636
3.0         5.5468
2.75        5.8255
2.5         6.3362
2.25        7.7763

Running the Perplexity Test

To evaluate the model’s perplexity, you can use the following bash script. This script serves as a guide and should help you obtain the perplexity scores you need.


#!/bin/bash
# Activate the conda environment
source ~/miniconda3/etc/profile.d/conda.sh
conda activate exllamav2

DATA_SET=root/wikitext/wikitext-2-v1.parquet
# Set the model name and bit size
MODEL_NAME=WizardLM-2-8x22B
BIT_PRECISIONS=(6.0 5.5 5.0 4.5 4.0 3.5 3.25 3.0 2.75 2.5 2.25)

# Print the markdown table header
echo "Quant Level  Perplexity Score"
echo "-------------------------------"

for BIT_PRECISION in ${BIT_PRECISIONS[@]}
do  
  LOCAL_FOLDER=root/models/$MODEL_NAME/exl2_$BIT_PRECISION/bpw  
  REMOTE_FOLDER=Dracones/$MODEL_NAME/exl2_$BIT_PRECISION/bpw  
  
  if [ ! -d $LOCAL_FOLDER ]; then    
    huggingface-cli download --local-dir-use-symlinks=False --local-dir $LOCAL_FOLDER $REMOTE_FOLDER root/download.log 21  
  fi
  
  output=$(python test_inference.py -m $LOCAL_FOLDER -gs 40,40,40,40 -ed $DATA_SET)  
  score=$(echo $output | grep -oP "Evaluation perplexity: K[\d.]+")  
  echo "$BIT_PRECISION  $score"   
done

Understanding the Perplexity Script

Imagine you’re a chef, and every day you prepare a different dish. The perplexity script is like your recipe book; it outlines the ingredients (data set and model name) and the steps needed to make the perfect dish (perplexity score). Each BIT_PRECISION is akin to tweaking your dish’s flavors; by adjusting these values, you create a savory result that’s well-optimized for your taste—or in this case, well-optimized for model performance!

Quantization Process

To create the quantization for the WizardLM-2-8x22B model, you can use the following bash script. This establishes the framework for quantizing the model based on varying BIT_PRECISION. Here’s how you can do that:


#!/bin/bash
# Activate the conda environment
source ~/miniconda3/etc/profile.d/conda.sh
conda activate exllamav2

# Set the model name and bit size
MODEL_NAME=WizardLM-2-8x22B

# Define variables
MODEL_DIR=/mnt/storage/models/$MODEL_NAME
OUTPUT_DIR=exl2_$MODEL_NAME
MEASUREMENT_FILE=measurements/$MODEL_NAME.json

# Create the measurement file if needed
if [ ! -f $MEASUREMENT_FILE ]; then    
  echo "Creating $MEASUREMENT_FILE"    
  # Create directories    
  if [ -d $OUTPUT_DIR ]; then        
    rm -r $OUTPUT_DIR    
  fi    
  mkdir $OUTPUT_DIR    
  python convert.py -i $MODEL_DIR -o $OUTPUT_DIR -nr -om $MEASUREMENT_FILE
fi

# Choose one of the below. Either create a single quant for testing or a batch of them.
BIT_PRECISIONS=(5.0 4.5 4.0 3.5 3.0 2.75 2.5 2.25)

for BIT_PRECISION in ${BIT_PRECISIONS[@]}
do    
  CONVERTED_FOLDER=models/$MODEL_NAME/exl2_$BIT_PRECISION/bpw  
  # If it doesn't already exist, make the quant  
  if [ ! -d $CONVERTED_FOLDER ]; then        
    echo "Creating $CONVERTED_FOLDER"        
    # Create directories        
    if [ -d $OUTPUT_DIR ]; then            
      rm -r $OUTPUT_DIR        
    fi        
    mkdir $OUTPUT_DIR        
    mkdir $CONVERTED_FOLDER                
    # Run conversion commands          
    python convert.py -i $MODEL_DIR -o $OUTPUT_DIR -nr -m $MEASUREMENT_FILE -b $BIT_PRECISION -cf $CONVERTED_FOLDER    
  fi  
done

Common Troubleshooting Tips

If you encounter issues while using the WizardLM-2-8x22B model, consider the following troubleshooting steps:

Ensure that your exllamav2 package is up-to-date.
Double-check the paths used in your scripts to ensure they point to the correct directories.
Confirm that the model files have been downloaded properly without any interruptions.
Review any error messages carefully, as they often provide clues to what went wrong.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

By following this guide, you should now be equipped to work effectively with the WizardLM-2-8x22B model and understand its perplexity scoring system. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox