Hello
Welcome to your essential toolkit for experimenting with and building on the OpenAI Vision API. This repository acts as a creativity hub, where innovative experiments unfold from simple image classifications to advanced zero-shot learning models. Whether you are a beginner or an expert, there’s space for everyone to explore the capabilities of the Vision API, exchange insights, and collaborate in expanding the frontiers of visual AI.
Getting Started
To begin experimenting with the OpenAI API, you will need to obtain an API key, which you can get here.
Limitations
- Each API key has a limit of 100 requests per day.
- The API cannot be used for object detection or image segmentation.
However, you can tackle this limitation by combining GPT-4V with foundational models like GroundingDINO or Segment Anything (SAM). For guidance, please refer to the example here and check out our blog post here.
Experiments
Check out the following fascinating experiments:
-
WebcamGPT: Chat with a video stream
-
HotDogGPT: Simple image classification application
-
Zero-shot image classifier with GPT-4V:
-
Zero-shot object detection with GroundingDINO + GPT-4V:
-
GPT-4V vs. CLIP:
-
GPT-4V with Set-of-Mark (SoM):
-
GPT-4V on Web:
-
Automated voiceover of NBA game:
Must Read Papers
- Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V by Jianwei Yang et al.
- The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) by Zhengyuan Yang et al.
- GPT-4 System Card by OpenAI
Blogs
- How CLIP and GPT-4V Compare for Classification
- Experiments with GPT-4V for Object Detection
- Distilling GPT-4 for Classification with an API
- DINO-GPT4-V: Use GPT-4V in a Two-Stage Detection Model
- First Impressions with GPT-4V(ision)
Contributions and Collaboration
We welcome your input to make this repository shine even brighter! If you’re interested in adding a new experiment or have improvement suggestions, feel free to open an issue or a pull request. If you’re ready to dive in and contribute a new experiment, please refer to our contribution guide for invaluable information.
Troubleshooting
While experimenting with the OpenAI Vision API, you might face some challenges. Here are some troubleshooting ideas to keep you on track:
- Check if you’ve exceeded the 100 requests per day limit. If so, consider optimizing your requests.
- If encountering issues with object detection or image segmentation, explore integrating foundational models as discussed earlier.
- Refer to the relevant blog posts and GitHub repositories to find solutions to specific experiment challenges.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.