How to Use CleanVision for Image Dataset Quality Auditing

Aug 18, 2024 | Data Science

In the realm of computer vision, the quality of the data fed into algorithms is paramount. CleanVision, an open-source tool developed by Cleanlab, simplifies the process of identifying potential issues within image datasets. This article will guide you through the steps to set up and use CleanVision to enhance your data quality, ensuring that your machine learning models are built on robust foundations.

Installation Steps

Before diving into auditing your image datasets, you need to install CleanVision. Follow these simple steps:

  1. Open your terminal.
  2. Execute the following command:
  3. pip install cleanvision
  4. This installs CleanVision and its dependencies.

Quickstart Guide

Now that CleanVision is installed, let’s get started with auditing an image dataset.

Step 1: Set Up Your Dataset

You can begin by downloading an example dataset or using your own collection of images. If you wish to download a sample dataset, run:

wget -nc https://cleanlab-public.s3.amazonaws.com/CleanVision/image_files.zip

Step 2: Run CleanVision

With your images at the ready, execute the following code in Python:

from cleanvision import Imagelab

# Specify path to folder containing the image files in your dataset
imagelab = Imagelab(data_path='FOLDER_WITH_IMAGES')

# Automatically check for a predefined list of issues within your dataset
imagelab.find_issues()

# Produce a neat report of the issues found in your dataset
imagelab.report()

Understanding the Code

Think of CleanVision as a quality control inspector for a factory. Each image in your dataset is a product rolling off the assembly line. CleanVision meticulously examines each “product” to identify flaws that might compromise quality.

  • Imagelab(data_path=’FOLDER_WITH_IMAGES’): This line defines the area where your products (images) are stored.
  • find_issues(): CleanVision inspects each product, searching for defects like blurry images or duplicates, similar to how an inspector checks for broken parts or inconsistencies.
  • report(): Finally, it generates a comprehensive report detailing the identified issues, analogous to an inspector summarizing their findings in a quality report.

Specific Issue Audits

While CleanVision can automatically check for numerous common issues, there’s an option to focus on specific types. Here’s how:

issue_types = {'dark': '', 'blurry': ''}
imagelab.find_issues(issue_types=issue_types)
imagelab.report(issue_types=issue_types)

Troubleshooting Common Issues

If you encounter issues while using CleanVision, consider the following troubleshooting tips:

  • Ensure that your Python version is 3.7 or higher.
  • Make sure the directory path for your images is correct.
  • Check the formats of your images; most common formats are supported, but some may not be.
  • Reinstall CleanVision if errors persist by running pip uninstall cleanvision followed by the installation command again.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

In summary, CleanVision is an essential tool for enhancing the quality of image datasets in computer vision projects. By detecting various issues such as duplicates and blurriness, it plays a crucial role in pre-model training and ensures that your data is clean and reliable.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox