How to Perform Concept Modeling with Images

Mar 27, 2024 | Data Science

Welcome to our comprehensive guide on using Concept, a powerful technique that combines CLIP and BERTopic to perform Concept Modeling on images. This approach is an innovation in image clustering, turning traditional topic modeling on its head, as we focus on concepts that characterize visual content rather than merely text-based topics. Let’s dive right in!

Installation

Before we can start using Concept, we need to install it along with the necessary libraries. To do so, run the following command:

pip install concept

Quick Start

We will begin by downloading and extracting a dataset of 25,000 images from Unsplash, which will be used for our Concept Modeling example. The process is straightforward and involves several steps:

1. Downloading and Extracting Images

Here’s how you can achieve this:

import os
import glob
import zipfile
from tqdm import tqdm
from sentence_transformers import util

# Directory for images
img_folder = 'photos'

# Create the directory if it doesn't exist
if not os.path.exists(img_folder) or len(os.listdir(img_folder)) == 0:
    os.makedirs(img_folder, exist_ok=True)
    
photo_filename = 'unsplash-25k-photos.zip'

# Download the dataset if it does not exist
if not os.path.exists(photo_filename):
    util.http_get('https://sbert.net/datasets/' + photo_filename, photo_filename)

# Extract all images
with zipfile.ZipFile(photo_filename, 'r') as zf:
    for member in tqdm(zf.infolist(), desc='Extracting'):
        zf.extract(member, img_folder)

img_names = list(glob.glob('photos/*.jpg'))

In the analogy of organizing your home, think of the process of creating the image folder as preparing a clean room. For clarity, you check if your room (directory) is empty or not and create one if necessary (making a folder). Then you retrieve a box of toys (the downloaded zip file) from a superstore (the internet) and finally unpack them into your room (extracting images).

2. Applying Concept Modeling

With the images ready, the next step is to apply the Concept Model:

from concept import ConceptModel

# Instantiate Concept Model
concept_model = ConceptModel()

# Fit and transform images to identify concepts
concepts = concept_model.fit_transform(img_names)

Now, let’s visualize the resulting concepts:

concept_model.visualize_concepts()

3. Labeling Concept Clusters with Topics

To enhance our models, we can label the concept clusters with topics using a list of 50,000 nouns:

import random
import nltk
nltk.download('wordnet')
from nltk.corpus import wordnet as wn

# Select all nouns
all_nouns = [word for synset in wn.all_synsets('n') for word in synset.lemma_names()]
selected_nouns = random.sample(all_nouns, 50_000)

# Fit the model with selected nouns
concept_model = ConceptModel()
concepts = concept_model.fit_transform(img_names, docs=selected_nouns)

4. Visualizing with Topics

Finally, visualize the concepts along with the generated topics:

concept_model.visualize_concepts()

Searching for Concepts

One of the striking features of Concept is the ability to search for specific concepts using embedded search terms. For instance, let’s search for the term “beach”:

concept_model.find_concepts('beach')

This will return some clusters that best represent the search term along with their similarity scores. We can then visualize those concepts:

concept_model.visualize_concepts(concepts=[100, 53, 95, 77, 97])

Troubleshooting

In case you encounter any issues while installing or running the code, here are some troubleshooting steps:

  • Ensure that you have Python 3.6 or later installed on your machine.
  • If you face problems with downloading the dataset, check your internet connection.
  • When importing libraries, make sure they are installed properly; you can re-run the installation command as needed.
  • If visualization does not appear, ensure that you are using a supported environment that can render images, such as Jupyter Notebook or Google Colab.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

And there you have it! A complete guide to Concept Modeling using images. This technique opens up numerous possibilities for understanding and organizing visual data, making it a valuable asset for developers and researchers alike. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox