How to Set Up Active Learning in Python with ALiPy

Nov 13, 2023 | Data Science

When it comes to harnessing the power of data, active learning offers a way to amplify the efficiency of your models. If you’re ready to dive into this fascinating world, ALiPy (Active Learning in Python) is a powerful framework that gives you access to over 20 different active learning algorithms. This article will guide you through the setup and use of ALiPy to supercharge your machine learning projects.

Introduction to ALiPy

ALiPy is more than just a library; it’s a comprehensive toolbox designed to make implementing active learning strategies as easy as pie. Imagine each algorithm like a skilled chef, equipped with unique recipes that can be tailored to suit your specific taste (or data). And the best part? It allows you to create your own recipes with minimal restrictions.

Features of ALiPy

  • Model Independent: Use any classification model—be it SVM in scikit-learn or deep models in TensorFlow.
  • Module Independent: Modify any part of the toolbox without disrupting the rest.
  • Custom Implementations: There’s no limitation when it comes to creating your own algorithms.
  • Variant Settings Supported: Handle noisy oracles, multi-label data, and more.
  • Powerful Tools: Save results, recover from breakpoints, run parallel k-fold experiments, and visualize data.

Setup Instructions

Setting up ALiPy is a breeze. Follow these steps to get started:

1. Install ALiPy via Pip

Simply run the following command in your terminal:

pip install alipy

2. Clone and Build ALiPy from Source

If you prefer building from source, use the following commands:

cd ALiPy
python setup.py install

Dependencies

To run ALiPy smoothly, ensure that you have the following dependencies:

  • Python: Version 3.4 or higher
  • Basic Dependencies: numpy, scipy, scikit-learn, matplotlib, prettytable
  • Optional Dependency: cvxpy (needed for specific algorithms)

Using ALiPy

ALiPy offers two methods to experiment with active learning:

1. High-Level Encapsulation

You can utilize the AlExperiment class for a straightforward setup:

from sklearn.datasets import load_iris
from alipy.experiment.al_experiment import AlExperiment

X, y = load_iris(return_X_y=True)
al = AlExperiment(X, y, stopping_criteria=num_of_queries, stopping_value=50)
al.split_AL()
al.set_query_strategy(strategy=QueryInstanceUncertainty, measure=least_confident)
al.set_performance_metric(accuracy_score)
al.start_query(multi_thread=True)
al.plot_learning_curve()

2. Custom Active Learning Experiments

If you want more control, you can set up your own active learning experiment using the toolkit provided by ALiPy:

import copy
from sklearn.datasets import load_iris
from alipy import ToolBox

X, y = load_iris(return_X_y=True)
alibox = ToolBox(X=X, y=y, query_type=AllLabels, saving_path='.')

# Split data
alibox.split_AL(test_ratio=0.3, initial_label_rate=0.1, split_count=10)

# Get default model
model = alibox.get_default_model()

# Define the stopping criterion
stopping_criterion = alibox.get_stopping_criterion(num_of_queries, 50)

# Use pre-defined strategy
QBCStrategy = alibox.get_query_strategy(strategy_name=QueryInstanceQBC)

QBC_result = []
for round in range(10):
    train_idx, test_idx, label_ind, unlab_ind = alibox.get_split(round)
    saver = alibox.get_stateio(round)

    while not stopping_criterion.is_stop():
        select_ind = QBCStrategy.select(label_ind, unlab_ind, model=None, batch_size=1)
        label_ind.update(select_ind)
        unlab_ind.difference_update(select_ind)

        model.fit(X=X[label_ind.index, :], y=y[label_ind.index])
        pred = model.predict(X[test_idx, :])
        accuracy = alibox.calc_performance_metric(y_true=y[test_idx], y_pred=pred, performance_metric=accuracy_score)

        st = alibox.State(select_index=select_ind, performance=accuracy)
        saver.add_state(st)
        saver.save()
        stopping_criterion.update_information(saver)

    stopping_criterion.reset()
    QBC_result.append(copy.deepcopy(saver))

analyser = alibox.get_experiment_analyser(x_axis=num_of_queries)
analyser.add_method(method_name='QBC', method_results=QBC_result)

print(analyser)
analyser.plot_learning_curves(title='Example of AL', std_area=True)

Troubleshooting

If you encounter any challenges while setting up or using ALiPy, here are a few troubleshooting tips:

  • Ensure that your Python version is compatible—3.4 or higher.
  • Check that all the required libraries are installed correctly.
  • For issues with specific algorithms, make sure the optional dependencies are installed.
  • Consult the official ALiPy documentation for extensive guidelines.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Concluding Thoughts

ALiPy is an incredible resource for those looking to delve deeper into active learning. Take the time to explore its functionalities and see how it can elevate your machine learning projects.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox