Welcome to the world of TrustLLM, a comprehensive toolkit designed to help researchers and developers assess the trustworthiness of large language models (LLMs). In this guide, we’ll walk you through the entire process of getting started with TrustLLM, from installation to evaluation.
About TrustLLM
TrustLLM is not just a toolkit; it’s a framework for studying the trustworthiness of large language models. It provides principles, surveys, benchmarks, and analysis tools that span various dimensions of trustworthiness, making it ideal for evaluating LLMs like ChatGPT and others. With TrustLLM, you’ll have access to a suite of tools to understand key aspects such as truthfulness, safety, fairness, robustness, and more.
Before Evaluation
To effectively evaluate LLMs using TrustLLM, follow these initial steps:
Installation
- Installation via Github (recommended):
git clone git@github.com:HowieHwong/TrustLLM.git - Installation via pip:
pip install trustllm - Installation via conda:
conda install -c conda-forge trustllm
Dataset Download
Start by downloading the TrustLLM dataset:
from trustllm.dataset_download import download_dataset
download_dataset(save_path=save_path)
Generation
Generation capabilities were introduced in version 0.2.0, and you can start generating results by following the guidelines:
from trustllm.generation.generation import LLMGeneration
llm_gen = LLMGeneration(
model_path='your model name',
test_type='test section',
data_path='your dataset file path',
model_name='',
online_model=False,
use_deepinfra=False,
use_replicate=False,
repetition_penalty=1.0,
num_gpus=1,
max_new_tokens=512,
debug=False,
device='cuda:0'
)
llm_gen.generation_results()
Evaluation
TrustLLM provides an efficient way to evaluate the trustworthiness of LLMs. To evaluate, run the following command:
from trustllm.task.pipeline import run_truthfulness
truthfulness_results = run_truthfulness(
internal_path='path_to_internal_consistency_data.json',
external_path='path_to_external_consistency_data.json',
hallucination_path='path_to_hallucination_data.json',
sycophancy_path='path_to_sycophancy_data.json',
advfact_path='path_to_advfact_data.json'
)
Troubleshooting
If you encounter any issues during installation or evaluation, here are some troubleshooting tips:
- Ensure that your Python version is compatible (3.9 is recommended).
- Double-check that you have the correct dependencies installed.
- If the scripts are not working, verify that the dataset paths are correctly specified.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.
Conclusion
TrustLLM equips you with essential tools to evaluate the trustworthiness of large language models critically. With its elaborate setup and easy evaluation pipeline, you can enhance your understanding of how different models perform across various metrics. Dive in and explore the fascinating world of LLMs with TrustLLM!

