Are you looking to boost your CLIP inference speed? If so, you’re in the right place! CLIP-ONNX is an innovative library that significantly speeds up CLIP inference, promising acceleration of up to 3x on a K80 GPU. Let’s dive into how you can get started with this robust library, with a sprinkle of troubleshooting tips thrown in for good measure.
Getting Started: Usage Instructions
To begin using CLIP-ONNX, you’ll first need to install the necessary modules. Here are the steps to set you in the right direction:
- Open your terminal or command prompt.
- Run the following command to install the required libraries:
python3!pip install git+https://github.com/Lednik7/CLIP-ONNX.git
!pip install git+https://github.com/openai/CLIP.git
!pip install onnxruntime-gpu
Example Process in 3 Simple Steps
The beauty of CLIP-ONNX lies in its simplicity. Here’s a straightforward approach to implementing it:
Step 1: Download CLIP Image
- Use the command below to download a CLIP image from the repository:
python3!wget -c -O CLIP.png https://github.com/openai/CLIP/blob/main/CLIP.png?raw=true
Step 2: Load the Standard CLIP Model, Image, and Text
- Run the following code to load the model:
python3
import clip
from PIL import Image
import numpy as np
# onnx cannot work with cuda
model, preprocess = clip.load('ViT-B/32', device='cpu', jit=False)
# batch first
image = preprocess(Image.open('CLIP.png')).unsqueeze(0).cpu() # [1, 3, 224, 224]
image_onnx = image.detach().cpu().numpy().astype(np.float32) # Prepare for ONNX
text = clip.tokenize(['a diagram', 'a dog', 'a cat']).cpu() # [3, 77]
text_onnx = text.detach().cpu().numpy().astype(np.int32)
Step 3: Create and Use CLIP-ONNX Object
- Now, you can convert the model to ONNX and run inference:
python3
from clip_onnx import clip_onnx
visual_path = 'clip_visual.onnx'
textual_path = 'clip_textual.onnx'
onnx_model = clip_onnx(model, visual_path=visual_path, textual_path=textual_path)
onnx_model.convert2onnx(image, text, verbose=True) # Model conversion
onnx_model.start_sessions(providers=[CPUExecutionProvider]) # Run in CPU mode
# Inference
image_features = onnx_model.encode_image(image_onnx)
text_features = onnx_model.encode_text(text_onnx)
logits_per_image, logits_per_text = onnx_model(image_onnx, text_onnx)
probs = logits_per_image.softmax(dim=-1).detach().cpu().numpy()
print("Label probs:", probs) # prints: [[0.9927937 0.00421067 0.00299571]]
Understanding the Code: An Analogy
Think of this code like a dedicated chef preparing an exquisite meal. The chef (your code) follows specific steps to prepare a gastronomic delight (CLIP model). Each ingredient—image, text, and model—are carefully selected and prepped:
- First, the chef downloads a fresh ingredient (the image).
- Next, the chef lays out the items: the main course (model) and sides (image and text).
- Finally, the chef combines everything into a pot (the ONNX object) and executes the recipe, resulting in a mouthwatering dish (inference results).
Troubleshooting Tips
Sometimes, during the process, things might not go as planned. Here are some helpful tips to fix common issues:
- If the ONNX conversion fails on the first attempt, simply rerun the command.
- Consider tweaking the export settings; the default settings may not work for every model:
python3
DEFAULT_EXPORT = dict(
input_names=[input],
output_names=[output],
export_params=True,
verbose=False,
opset_version=12,
do_constant_folding=True,
dynamic_axes={input: 0, output: 0})
If changing the settings doesn’t help, consider adjusting either the visual or textual options, or even shifting the opset_version to a newer version (like 15). Remember, coding is all about experimentation!
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
More Information
Want more examples or best practices? Check out the benchmark.md for performance insights and the examples folder for further details.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

