Welcome to your ultimate guide on utilizing the EVA-CLIP-18B model, a groundbreaking advancement in contrastive language-image pretraining (CLIP). With a massive 18 billion parameters, this model sets new standards for performance across various image classification tasks. Below, we’ll explore its features, usage instructions, and troubleshooting tips, ensuring you can leverage this powerhouse effectively.
Summary of EVA-CLIP Performance
- Achieves an impressive 80.7% zero-shot top-1 accuracy across 27 benchmarks.
- Outperforms earlier models with fewer parameters, showcasing the advantages of scaling.
- Utilizes a refined dataset of 2-billion image-text pairs from the LAION-2B and COYO-700M datasets.
Model Card
EVA-CLIP-8B
Total Parameters: 8.1B
Average Accuracy: 79.4%
Download Weights: [Download PyTorch Weights](https://huggingface.co/BAAI/EVA-CLIP-8B)
EVA-CLIP-18B
Total Parameters: 18.1B
Average Accuracy: 80.7%
Download Weights: Stay tuned for the release!
Usage Instructions
To harness the capabilities of EVA-CLIP-18B, you can utilize it in either the Hugging Face framework or through direct PyTorch implementation. Below are the instructions for both approaches:
Using Hugging Face Version
python
from PIL import Image
from transformers import AutoModel, AutoConfig, CLIPImageProcessor, CLIPTokenizer
import torch
image_path = "CLIP.png"
model_name_or_path = "BAAI/EVA-CLIP-8B"
image_size = 224
processor = CLIPImageProcessor.from_pretrained(model_name_or_path)
model = AutoModel.from_pretrained(model_name_or_path, torch_dtype=torch.float16).to("cuda").eval()
image = Image.open(image_path)
captions = ["a diagram", "a dog", "a cat"]
tokenizer = CLIPTokenizer.from_pretrained(model_name_or_path)
input_ids = tokenizer(captions, return_tensors="pt", padding=True).input_ids.to("cuda")
input_pixels = processor(images=image, return_tensors="pt", padding=True).pixel_values.to("cuda")
with torch.no_grad(), torch.cuda.amp.autocast():
image_features = model.encode_image(input_pixels)
text_features = model.encode_text(input_ids)
label_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
print(f"Label probs: {label_probs}")
Using PyTorch Version
python
import torch
from eva_clip import create_model_and_transforms, get_tokenizer
from PIL import Image
model_name = "EVA-CLIP-8B"
pretrained = "eva_clip"
image_path = "CLIP.png"
captions = ["a diagram", "a dog", "a cat"]
device = "cuda" if torch.cuda.is_available() else "cpu"
model, _, processor = create_model_and_transforms(model_name, pretrained, force_custom_clip=True)
tokenizer = get_tokenizer(model_name)
model = model.to(device)
image = processor(Image.open(image_path)).unsqueeze(0).to(device)
text = tokenizer(captions).to(device)
with torch.no_grad(), torch.cuda.amp.autocast():
image_features = model.encode_image(image)
text_features = model.encode_text(text)
text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
print("Label probs:", text_probs)
Troubleshooting
If you run into any issues while using EVA-CLIP-18B, here are some troubleshooting steps to consider:
- Memory Issues: Ensure you have sufficient GPU memory. If you encounter memory overflow, consider using DeepSpeed for model loading optimization.
- Import Errors: Make sure you have installed all required libraries, especially
transformers,torch, and relevant image processing libraries. - Model Download: Check your internet connection when attempting to download model weights. If issues persist, try downloading from a different network.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
The EVA-CLIP-18B model is a monumental step in the field of multimodal AI, embodying the convergence of vision and language understanding. By equipping yourself with this knowledge and utilizing the available resources, you can unlock the potential for innovative applications and research developments in the world of AI.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

