Cherche: Your Guide to Neural Search

May 31, 2024 | Data Science

homemayankDocumentsarticle-generation-using-llmresized_images_gitnatural_language_processingreadme_raphaelsty_cherche

Cherche enables the development of a neural search pipeline that employs retrievers and pre-trained language models to both retrieve and rank results. Its primary advantage lies in its capability to construct end-to-end pipelines, perfect for offline semantic search and batch computation. Let’s dive in and explore how to use it!

Features of Cherche

Cherche is packed with features, including:

Live demo of a NLP search engine powered by Cherche

Installation

Installing Cherche is as easy as 1-2-3. Follow these simple commands depending on your requirement:

For a simple retriever on CPU (like TfIdf):
```
pip install cherche
```
For any semantic retriever or ranker on CPU:
```
pip install cherche[cpu]
```
For any semantic retriever or ranker on GPU:
```
pip install cherche[gpu]
```

By following these installation instructions, you will be able to use Cherche with the appropriate requirements for your needs.

Documentation

Documentation is available here. It provides details about retrievers, rankers, pipelines, and examples.

Getting Started with Cherche

Documents

Cherche allows you to find the right document amongst a list of objects. Here is an example:

from cherche import data

documents = data.load_towns()
documents[:3]

# Output: 
# [
#   {id: 0, title: "Paris", url: "https://en.wikipedia.org/wiki/Paris", article: "Paris is the capital and most populous city of France."},
#   {id: 1, title: "Paris", url: "https://en.wikipedia.org/wiki/Paris", article: "Since the 17th century, Paris has been one of Europe's major centres of science and arts."},
#   {id: 2, title: "Paris", url: "https://en.wikipedia.org/wiki/Paris", article: "The City of Paris is the centre and seat of government of the region and province of Île-de-France."}
# ]

Retriever Ranker

Let’s think of Cherche as a chef that makes a gourmet dish. The retriever is like the sous-chef who gathers all the ingredients (data documents) needed for the dish, while the ranker is the seasoned chef selecting the best ingredients based on taste preference (similarity between the query and the documents). The synergy between these two creates an exquisite output—relevant search results!

Below is how you can set up a neural search pipeline:

from cherche import data, retrieve, rank
from sentence_transformers import SentenceTransformer

# Load documents
documents = data.load_towns()

# Retrieve using TF-IDF
retriever = retrieve.BM25(
    key='id',
    on=['title', 'article'],
    documents=documents,
    k=30
)

# Rank documents with a semantic model
ranker = rank.Encoder(
    key='id',
    on=['title', 'article'],
    encoder=SentenceTransformer('sentence-transformers/all-mpnet-base-v2').encode,
    k=3,
)

# Create search pipeline
search = retriever + ranker
search.add(documents=documents)

# Execute search with queries
search(['Bordeaux', 'Paris', 'Toulouse'])

Retrieve & Rank

Retrieve: Cherche provides several retrievers like TfIdf, BM25, and more to filter input documents based on a query.

Rank: You can utilize various rankers for better performance based on the output of the retrievers.

Troubleshooting

If you happen to run into issues while using Cherche, here are some helpful troubleshooting tips:

Ensure you have the correct dependencies installed based on your requirements (CPU or GPU).
Check your data format and structure. It must align with Cherche’s expected input.
If you’re facing unexpected output, re-evaluate your retriever and ranker settings.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

Cherche is an incredible tool that simplifies the process of implementing neural search pipelines. By following the steps and leveraging the functionalities outlined in this blog, you’ll be well on your way to mastering Cherche.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox