HoloClean: A Machine Learning System for Data Enrichment

Nov 20, 2020 | Data Science

Have you ever found yourself drowning in unstructured data, looking for a lifebuoy? Enter HoloClean, your trusty statistical inference engine that helps you to impute, clean, and enrich datasets like a pro! Built on PyTorch and PostgreSQL, HoloClean intelligently uses its weakly supervised machine learning capabilities to ensure that you can focus on what matters most— deriving insights from your data without losing precious time to manual cleaning processes.

How to Install HoloClean

Ready to dive into the world of data enrichment? Here’s a step-by-step guide for installing HoloClean, whether you’re using PostgreSQL natively or through Docker.

1. Install and Configure PostgreSQL

Option 1: Native Installation of PostgreSQL

A native installation of PostgreSQL can run a bit faster than a Docker container, so if speed is a priority, here’s how to set it up:

a. Installing PostgreSQL
  • On Ubuntu, simply run:
    $ apt-get install postgresql postgresql-contrib
  • For macOS installation instructions, visit PostgreSQL MacOS Installation.
b. Setting up PostgreSQL for HoloClean

Next, you will need to configure PostgreSQL:

  1. Start the psql console from the terminal:
    $ psql --user username
  2. Create a database and a user:
    CREATE DATABASE holo;
    CREATE USER holocleanuser;
    ALTER USER holocleanuser WITH PASSWORD abcd1234;
    GRANT ALL PRIVILEGES ON DATABASE holo TO holocleanuser;
    ALTER SCHEMA public OWNER TO holocleanuser;
  3. Connect to the holo database:
    psql -U holocleanuser -W holo
  4. To clear the database, run:
    DROP DATABASE holo;
    CREATE DATABASE holo;

Option 2: Using Docker

If you’re more comfortable with Docker, you can easily spin up a PostgreSQL container by running:

docker run --name pghc -e POSTGRES_DB=holo -e POSTGRES_USER=holocleanuser -e POSTGRES_PASSWORD=abcd1234 -p 5432:5432 -d postgres:11

This command starts a PostgreSQL server with the necessary configurations. You can start or stop the container using docker start pghc and docker stop pghc.

For more information about this Docker image, read more here.

2. Setting Up HoloClean

Let’s get HoloClean running smoothly on your machine. We recommend using a virtual environment:

Creating a Virtual Environment for HoloClean

Option 1: Conda Virtual Environment

Download Anaconda from here, and create a new Conda environment:

$ conda create -n hc36 python=3.6

Don’t forget to activate it:

$ conda activate hc36
Option 2: Virtual Environment using Pip and Virtualenv

If you prefer pip and Virtualenv:

$ pip install virtualenv

Create a directory and your environment:

$ mkdir -p hc36
$ virtualenv --python=python3.6 hc36

Activate it:

$ source hc36/bin/activate
Install Required Python Packages

Make sure your virtual environment is activated throughout the installation, then run:

$ pip install -r requirements.txt

Note for macOS Users: You might need to install XCode developer tools using xcode-select --install.

Running HoloClean

To see HoloClean in action, navigate to the examples directory and run the example script:

$ cd examples
$ ./start_example.sh

This sets up your Python path environment to run HoloClean.

Troubleshooting

While setting up HoloClean, you may encounter a few hiccups. Here’s a handy troubleshooting guide:

  • If you’re having trouble with database connections, ensure that your PostgreSQL service is up and running.
  • If you receive permission-related errors, double-check that the user holocleanuser has been granted the necessary privileges.
  • Should you run into issues with Python packages not completing their installations, confirm that your virtual environment is activated before running the installation command.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox