CVNets is a powerful computer vision library designed for researchers and engineers looking to train a variety of mobile and non-mobile models for tasks such as object classification, object detection, semantic segmentation, and adapting foundation models like CLIP. In this guide, we will help you navigate the waters of CVNets, from installation to troubleshooting, in an easy-to-understand manner.
Table of Contents
- What’s New?
- Installation
- Getting Started
- Supported Models and Tasks
- Maintainers
- Research Effort at Apple Using CVNets
- Contributing to CVNets
- License
- Citation
What’s New?
As of July 2023, CVNets has rolled out Version 0.4, boasting enhanced features like:
- Bytes Are All You Need: Transformers Operating Directly On File Bytes
- RangeAugment: Efficient online augmentation with Range Learning
- Training and evaluating foundation models (CLIP)
- Mask R-CNN
- EfficientNet, Swin Transformer, and ViT
- Enhanced distillation support
Installation
Follow the steps below to get CVNets up and running:
# Clone the repository
git clone git@github.com:appleml-cvnets.git
cd ml-cvnets
# Create a virtual environment
conda create -n cvnets python=3.10.8
conda activate cvnets
# Install requirements and CVNets package
pip install -r requirements.txt -c constraints.txt
pip install --editable .
Getting Started
To dive into CVNets, explore the following resources:
- General instructions for working with CVNets can be found here.
- Examples for training and evaluating models are available here and here.
- Learn to convert a PyTorch model to CoreML here.
Supported Models and Tasks
CVNets supports a plethora of models and tasks. For a detailed list, check out the Model Zoo and the examples folder.
Model Highlights Include:
- **CNN Models:** MobileNet variants, EfficientNet, ResNet, and RegNet
- **Transformers:** Vision Transformer, MobileViT variants, and SwinTransformer
- **Object Detection:** SSD, Mask R-CNN
- **Semantic Segmentation:** DeepLabv3, PSPNet
- **Foundation Models:** CLIP
- **Data Augmentation:** RangeAugment, AutoAugment, RandAugment
Maintainers
CVNets is developed and maintained by:
- Sachina Mehta
- Maxwell Horton
- Mohammad Sekhavata
- Yanzi Jin
Research Effort at Apple Using CVNets
Researchers at Apple have published several influential papers utilizing CVNets:
- MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer, ICLR22
- CVNets: High performance library for Computer Vision, ACM MM22
- Separable Self-attention for Mobile Vision Transformers (MobileViTv2)
- RangeAugment: Efficient Online Augmentation with Range Learning
- Bytes Are All You Need: Transformers Operating Directly on File Bytes
Contributing to CVNets
CVNets encourages community contributions. Check out the contributing document for more information about how to get involved. Make sure to respect the Code of Conduct.
License
For licensing details, please visit the LICENSE file.
Citation
If you find CVNets useful in your work, please cite the papers provided in the documentation.
Troubleshooting
If you encounter any issues while using CVNets, here are some troubleshooting tips:
- Ensure that you have the correct Python version (3.10+) and PyTorch version (1.12.0).
- Double-check that you activated the CVNets environment before running any commands.
- If installation fails, revisit the requirements file to ensure all dependencies are met.
- For installation or compatibility problems, consider reaching out to the maintainers via GitHub or relevant forums.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.