Machine learning is a powerful tool, but preparing your data for it can often feel like trying to create a gourmet meal without ever stepping into a kitchen. Fortunately, skrub simplifies this process for you. Formerly known as *dirty_cat*, this Python library is designed to streamline the way you handle your tabular data, making it easier to get to the fun part—building your models!
What Can Skrub Do?
The magic of skrub lies in its ability to bridge the gap between tabular data sources and machine-learning models. Here’s what you can do with skrub:
- Data Joining: High-level tools like Joiner and AggJoiner help you efficiently merge different dataframes.
- Encoding Columns: Use tools including MinHashEncoder and ToCategorical to encode your columns effectively.
- Building Pipelines: Easily create data pipelines using TableVectorizer and tabular_learner.
Getting Started with Skrub
If you’re ready to dive in, let’s take a look at how to use skrub to fetch a dataset, prepare your data, and evaluate a model. First, make sure you have skrub installed. You can do this with either pip or conda!
from skrub.datasets import fetch_employee_salaries
# Fetching the dataset
dataset = fetch_employee_salaries()
df = dataset.X
y = dataset.y
# Sample output of the dataset
print(df.iloc[0])
In the analogy of preparing a fine dining experience, fetch_employee_salaries() is like sourcing the freshest ingredients from a local market. By calling this function, you’re bringing in a rich dataset that is ready to be cooked to perfection!
from sklearn.model_selection import cross_val_score
from skrub import tabular_learner
# Evaluating the model
cross_val_score(tabular_learner(regressor), df, y)
Just like seasoning a dish with the right spices, cross_val_score() adds flavor to your model evaluation by letting you know how well it’ll perform on unseen data, based on cross-validation scores.
Installation Instructions
Skrub can easily be installed via pip or conda. To find more detailed installation instructions, visit skrub-data.org.
Troubleshooting
If you encounter issues while installing or using skrub, here are some troubleshooting tips:
- Ensure your Python environment is correctly set up (Python 3.6 and above is recommended).
- If you experience installation failures, try running the command as an administrator or use a virtual environment.
- Check for any dependencies that might not be satisfied.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Contributing to Skrub
The best way to support skrub’s development is to spread the word! If you’re already using it, we’d love to hear how you’re using it and any challenges you’re facing. Feel free to join the discussions on GitHub.
If you have bugs to report or enhancements to suggest, please open an issue or submit a pull request on GitHub.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

