Unlocking the Power of Data with Desbordante: A User Guide

Jan 5, 2024 | Data Science

Data profiling has taken a giant leap forward with Desbordante, a high-performance data profiler capable of discovering and validating complex data patterns. Whether you’re working in science, business, or machine learning, Desbordante can turn the way you interact with your data upside down. This blog post serves to guide you through installing, utilizing, and troubleshooting this powerful tool.

What is Desbordante?

Desbordante is designed to help you define and discover different patterns in your datasets through two primary tasks:

  • Discovery: This task identifies instances of specific patterns within a dataset.
  • Validation: Unlike discovery, this task checks if a specified pattern instance exists and provides detailed feedback on conflicts in datasets.

Additionally, the tool features dynamic algorithms that allow the table structure to change after a result has been found, resulting in faster computations than static algorithms, thus saving time and computational resources.

Installation Guide

To start using Desbordante, follow these installation steps.

Step 1: Prerequisites

  • Python version 3.7 or higher.
  • GNU GCC (version 10 and above), CMake (version 3.13 and above), and Boost library (version 1.81.0 and above) are also required.

Step 2: Installation

Run the following command to install Desbordante:

pip install desbordante

Note: If you face issues due to the C++ core, consider building it from the source following the specific instructions provided.

Using Desbordante

Desbordante can be accessed via three main interfaces:

  • Console application: Here you can run command-line queries for simple pattern discovery and validation.
  • Python bindings: Directly run Desbordante within Python programs, which allows for preprocessing data using popular libraries like pandas.
  • Web application: An interactive web interface designed for data profiling with a focus on discovery and validation tasks.

Getting Started with Code

Let’s imagine you are a librarian, and each book in your library represents a pattern in a dataset. The Discovery feature is akin to helping you find specific types of books (say mystery novels or history books) quickly. Validation, on the other hand, focuses on verifying if a particular book (or pattern instance) can be found and what the reasons might be if it’s not present (like it’s checked out or misplaced).

Here’s how you can discover exact functional dependencies in your dataset using Python:

import desbordante

TABLE = 'path_to_your_table.csv'
algo = desbordante.fd.algorithms.Default()
algo.load_data(table=(TABLE, ',', True))
algo.execute()
result = algo.get_fds()
print("FDs:")
for fd in result:
    print(fd)

Troubleshooting Common Issues

Even with great tools, challenges can arise. Here are some common issues and solutions:

  • Error downloading datasets: If you encounter a “Smudge error” while cloning the repo, ensure you set the following environment variable before retrying:
  • export GIT_LFS_SKIP_SMUDGE=1
  • No type hints in your IDE: If type hints aren’t working in Visual Studio Code, simply install the stubs with the command:
  • pip install desbordante-stubs
  • For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

Understanding Desbordante might take some initial effort, especially when navigating through its various patterns and features. But once you grasp the concepts, it becomes a powerful ally in data profiling across different domains.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox