Getting Started with Pingouin: A Comprehensive Guide

Apr 25, 2022 | Data Science

Pingouin is an open-source statistical package that empowers you to execute a wide array of statistical analyses using Python. Built on popular libraries like Pandas and NumPy, Pingouin is designed for users who appreciate simple yet exhaustive statistical functions.

Why Choose Pingouin?

Pingouin stands out for its versatility, as it not only delivers critical statistics but also ensures users receive detailed outputs. For instance, where SciPy provides only the T-value and p-value, Pingouin’s ttest function gives a treasure trove of information, including effect sizes and confidence intervals.

Installation of Pingouin

To harness the power of Pingouin, you first need to install it. Here’s how to do it:

  • Using pip: Open your terminal and enter the following command:
  • pip install pingouin
  • Using conda: If you prefer conda, run this command:
  • conda install -c conda-forge pingouin
  • Keep it Updated: To ensure you are using the latest features of Pingouin, periodically run:
  • pip install --upgrade pingouin

Quick Start: Examples of Functions

Let’s dive into some quick examples to get the feel of Pingouin using common statistical tests. Think of Pingouin as a Swiss army knife for statistical analysis, where each function is a different tool for a different task.

1. T-test

Imagine you have two groups of apples, and you want to know if one type is generally sweeter than the other. You would perform a T-test. Let’s see how this can be done in Pingouin:

import numpy as np
import pingouin as pg

np.random.seed(123)
mean, cov, n = [4, 5], [(1, .6), (.6, 1)], 30
x, y = np.random.multivariate_normal(mean, cov, n).T

# Perform T-test
pg.ttest(x, y)

This would provide you with not just the p-value, but also effect size and statistical power, giving you a complete picture of your data.

2. Pearson’s Correlation

Next, think of two factors, such as the amount of sunlight and sweetness in apples. To understand their relationship, we can compute the Pearson’s correlation:

pg.corr(x, y)

This will return a correlation coefficient that tells you how closely related these two variables are.

3. Normality Test

Before conducting some statistical tests, it’s important to check if your data follow a normal distribution. That’s where the normality test comes into play:

print(pg.normality(x))

This function will let you check whether your data meet the assumptions required for many statistical techniques.

Troubleshooting Pingouin

If you encounter any issues during installation or while using Pingouin, here are a few troubleshooting tips:

  • Ensure all dependencies are correctly installed: NumPy, SciPy, Pandas, Matplotlib, and Seaborn should be properly configured.
  • For runtime errors, check the compatibility of your Python version. Pingouin works well with Python versions 3.8 to 3.11.
  • If you need further support, please contribute to discussions directly on GitHub Discussions.
  • For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Final Thoughts

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox