Are you ready to elevate your machine learning model by optimizing it using OpenVINO and the Neural Network Compression Framework (NNCF)? In this user-friendly guide, we will walk you through the necessary steps to clone and set up a downstream optimized model utilizing the features of NNCF.
Getting Started
This guide focuses on the optimization of the model vuiseng9bert-base-squadv1-block-pruning-hybrid-filled-lt. We will apply various optimizations, including:
- Magnitude sparsification at 50%
- NNCF Quantize-Aware Training with symmetric 8-bit quantization
- Custom distillation using the BERT large model
Setting Up Your Environment
Follow these instructions to properly set up your environment:
# Clone OpenVINO NNCF
git clone https://github.com/vuiseng9/nncf
cd nncf
git checkout tld-poc
git reset --hard 1dec7afe7a4b567c059fcf287ea2c234980fded2
python setup.py develop
pip install -r examples/torch/requirements.txt
# Clone Huggingface nn_pruning
git clone https://github.com/vuiseng9/nn_pruning
cd nn_pruning
git checkout reproduce-evaluation
git reset --hard 2d4e196d694c465e43e5fbce6c3836d0a60e1446
pip install -e .[dev]
# Clone Huggingface Transformers
git clone https://github.com/vuiseng9/transformers
cd transformers
git checkout tld-poc
git reset --hard 10a1e29d84484e48fd106f58957d9ffc89dc43c5
pip install -e .head -n 1 examples/pytorch/question-answering/requirements.txt | xargs -i pip install
# Additional dependencies
pip install onnx
Training the Model
Once your environment is set up, follow these steps to train your model:
# Train
BASE_MODEL=path/to/cloned_repo_above # to-revise
wget https://huggingface.co/vuiseng9/bert-base-squadv1-block-pruning-hybrid-filled-lt-nncf-50.0sparse-qat-lt/raw/main/nncf_bert_squad_sparsity.json
NNCF_CFG=path/to/downloaded_nncf_cfg_above # to-revise
OUTROOT=path/to/train_output_root # to-revise
WORKDIR=transformers/examples/pytorch/question-answering # to-revise
RUNID=bert-base-squadv1-block-pruning-hybrid-filled-lt-nncf-50.0sparse-qat-lt
cd $WORKDIR
OUTDIR=$OUTROOT$RUNID
mkdir -p $OUTDIR
export CUDA_VISIBLE_DEVICES=0
NEPOCH=5
python run_qa.py \
--model_name_or_path vuiseng9/bert-base-squadv1-block-pruning-hybrid \
--optimize_model_before_eval \
--optimized_checkpoint $BASE_MODEL \
--dataset_name squad \
--do_eval \
--do_train \
--evaluation_strategy steps \
--eval_steps 250 \
--learning_rate 3e-5 \
--lr_scheduler_type cosine_with_restarts \
--warmup_ratio 0.25 \
--cosine_cycles 1 \
--teacher bert-large-uncased-whole-word-masking-finetuned-squad \
--teacher_ratio 0.9 \
--num_train_epochs $NEPOCH \
--per_device_eval_batch_size 128 \
--per_device_train_batch_size 16 \
--max_seq_length 384 \
--doc_stride 128 \
--save_steps 250 \
--nncf_config $NNCF_CFG \
--logging_steps 1 \
--overwrite_output_dir \
--run_name $RUNID \
--output_dir $OUTDIR
Evaluating the Model
After training, it’s time to evaluate your model using the following commands:
# Eval
git clone https://huggingface.co/vuiseng9/bert-base-squadv1-block-pruning-hybrid-filled-lt-nncf-50.0sparse-qat-lt
MODELROOT=path/to/cloned_repo_above # to-revise
export CUDA_VISIBLE_DEVICES=0
OUTDIR=eval-bert-base-squadv1-block-pruning-hybrid-filled-lt-nncf-50.0sparse-qat-lt
WORKDIR=transformers/examples/pytorch/question-answering # to-revise
cd $WORKDIR
mkdir $OUTDIR
nohup python run_qa.py \
--model_name_or_path vuiseng9/bert-base-squadv1-block-pruning-hybrid \
--dataset_name squad \
--optimize_model_before_eval \
--qat_checkpoint $MODELROOT/checkpoint-26250 \
--nncf_config $MODELROOT/nncf_bert_squad_sparsity.json \
--to_onnx $OUTDIR/bert-base-squadv1-block-pruning-hybrid-filled-lt-nncf-50.0sparse-qat-lt.onnx \
--do_eval \
--per_device_eval_batch_size 128 \
--max_seq_length 384 \
--doc_stride 128 \
--overwrite_output_dir \
--output_dir $OUTDIR | tee $OUTDIR/run.log
Troubleshooting Tips
If you run into any issues during the process, here are some troubleshooting ideas:
- Ensure that all repository URLs are correct, and the required branches or commits are checked out.
- Validate that you have all the necessary dependencies installed. Missing packages can lead to unexpected errors.
- Check your environment settings, especially CUDA devices. Ensure your GPU resource is properly allocated.
- If you encounter memory issues during training, consider reducing the batch size or model size.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
That’s it! By following these steps, you have successfully optimized your model using OpenVINO and NNCF. The analogy for this process is like refining a recipe in cooking; you adjust ingredients and methods to make the dish more efficient and delicious. Here’s to all your future model optimizations!
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

