Evidently

Oct 22, 2020 | Data Science

An open-source framework to evaluate, test and monitor ML and LLM-powered systems.

PyPi Downloads License PyPi

Documentation | Discord Community | Blog | Twitter | Evidently Cloud

New release: Evidently 0.4.25. LLM evaluation – Tutorial

What is Evidently?

Evidently is an open-source Python library designed for ML (Machine Learning) and LLM (Large Language Model) evaluation and observability. It provides tools to evaluate, test, and monitor AI-powered systems and data pipelines, facilitating a seamless transition from experimentation to production.

  • Works with tabular, text data, and embeddings.
  • Supports predictive and generative systems, from classification to retrieval-augmented generation (RAG).
  • Features over 100 built-in metrics from data drift detection to LLM judges.
  • Provides a Python interface for creating custom metrics and tests.
  • Supports both offline evaluations and live monitoring.
  • Open architecture for easy data export and integration with existing tools.

Evidently’s modular design allows users to start with simple evaluations using Reports or Test Suites in Python, or dive deeper into real-time monitoring with a Dashboard service.

1. Reports

Reports compute various data, ML, and LLM quality metrics, which can be started with Presets or customized based on your needs.

  • Deliver out-of-the-box interactive visuals.
  • Ideal for exploratory analysis and debugging.
  • Results can be generated in Python, exported as JSON, Python dictionary, HTML, DataFrame, or viewed in a monitoring UI.
Report example

2. Test Suites

Test Suites check defined conditions on metric values, providing a pass or fail result.

  • Perfect for regression testing, CI/CD checks, or data validation pipelines.
  • Zero setup option: auto-generate test conditions from the reference dataset.
  • Utilizes simple syntax for custom test conditions (e.g., gt for greater than, lt for less than).
  • Results can be output in several formats similar to Reports.
Test example

3. Monitoring Dashboard

The Monitoring UI helps visualize metrics and test results over time. You have two options:

  • Self-host the open-source version. Check out the Live demo.
  • Sign up for Evidently Cloud (recommended), which offers a generous free tier and additional features.
Dashboard example

How to Install Evidently

Evidently is available as a PyPI package. To install it using pip, run:

pip install evidently

To install Evidently using conda, run:

conda install -c conda-forge evidently

Getting Started

Here’s how you can start with Evidently using two simple options:

Option 1: Test Suites

This option involves creating a simple Hello World example. Here’s how you can do it:

import pandas as pd
from sklearn import datasets
from evidently.test_suite import TestSuite
from evidently.test_preset import DataStabilityTestPreset

iris_data = datasets.load_iris(as_frame=True)
iris_frame = iris_data.frame

data_stability = TestSuite(tests=[DataStabilityTestPreset()])
data_stability.run(current_data=iris_frame.iloc[:60], reference_data=iris_frame.iloc[60:], column_mapping=None)
data_stability

This code can be likened to baking a cake: you gather your ingredients (data), mix them together (run the Test Suite), and the end product is your cake (evaluation results). Just like baking, if something goes wrong, it helps to know which ingredient or step led to the mishap!

Option 2: Reports

You can also generate Reports. Here’s how:

from evidently.report import Report
from evidently.metric_preset import DataDriftPreset

data_drift_report = Report(metrics=[DataDriftPreset()])
data_drift_report.run(current_data=iris_frame.iloc[:60], reference_data=iris_frame.iloc[60:], column_mapping=None)
data_drift_report

Just like checking the scope of a project: you evaluate how things are currently (current data) against what you expected (reference data).

What Can You Evaluate?

Evidently features over 100 built-in evaluations, and you can add custom ones too. Here’s a glimpse:

  • Text descriptors, like length and toxicity.
  • Data quality and distribution drift assessments.
  • Evaluation of ML classification and regression models.
  • Rankings and recommendations metrics.

Troubleshooting Tips

If you run into anything tricky while using Evidently, consider the following:

  • Always check the syntax of your Python code; small typos can lead to errors.
  • Review the data formats to ensure compatibility with Evidently.
  • Consult the official documentation for additional guidance.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox