How to Slash Your LLM API Costs by 10x and Boost Speed by 100x with GPTCache

Sep 13, 2020 | Data Science

In the realm of large language models (LLMs), costs can spiral uncontrollably alongside their incredible capabilities. Enter GPTCache, a library designed to create a semantic cache for LLM queries. By utilizing GPTCache, not only can you significantly cut down on expenses, but you can also enhance the speed of your responses. This guide will walk you through getting started with GPTCache, how it works, and troubleshooting tips.

What is GPTCache?

GPTCache is a smart caching solution that stores responses from LLMs, which means repeated requests can be served faster, cutting down on costs and improving efficiency. Imagine you own a bakery that becomes wildly popular. The first few customers get their pastries right from the oven, but as the rush grows, each customer has to wait longer, and you might lose sales. Instead, you decide to prepare extra pastries ahead of the rush. When customers arrive, you can serve them quickly instead of having them wait. This is how GPTCache helps your applications manage high traffic without the excessive running costs.

Getting Started with GPTCache

Follow these simple steps to get your caching solution up and running:

Quick Install

Ensure you have Python version 3.8.1 or higher by running python --version.
If you need to upgrade pip, execute: python -m pip install --upgrade pip.
Install GPTCache with the command: pip install gptcache.

Setting Up GPTCache Experiments

After installation, you can clone the GPTCache repository and install it as follows:

git clone -b dev https://github.com/zilliztech/GPTCache.git
cd GPTCache
pip install -r requirements.txt
python setup.py install

How to Use GPTCache Effectively

Here are a couple of examples to demonstrate how GPTCache enhances your existing LLM implementations.

Example 1: Exact Match Cache

This code allows you to cache results when the same question is asked multiple times, thus providing a fast response without additional API calls:

from gptcache import cache
from gptcache.adapter import openai

cache.init()
cache.set_openai_key()

question = "whats github"
for _ in range(2):
    response = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": question}])
    print(response["choices"][0]["message"]["content"])

Example 2: Similar Search Cache

This functionality helps retrieve answers to similar questions without hitting the API repeatedly:

from gptcache import cache
from gptcache.adapter import openai

cache.init()
cache.set_openai_key()

questions = [
    "whats github",
    "can you explain what GitHub is?",
    "can you tell me more about GitHub?",
    "what is the purpose of GitHub?"
]
for question in questions:
    response = openai.ChatCompletion.create(model="gpt-3.5-turbo", messages=[{"role": "user", "content": question}])
    print(response["choices"][0]["message"]["content"])

Troubleshooting Tips

If you encounter any issues while setting up or using GPTCache, here are some troubleshooting ideas to help you out:

Ensure your environment variable for the OpenAI API key is set up correctly. Run echo $OPENAI_API_KEY in your terminal to check.
If the cache doesn’t appear to be working, try clearing it using cache.clear().
Check if any updates to both Python and pip might be needed—outdated software can sometimes cause unexpected issues.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

With GPTCache, handling high traffic with LLMs no longer has to come at a steep cost. It optimizes performance while drastically reducing expenses—setting you up for success in the world of AI.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox