Model List |
FAQ |
Usage |
Evaluation |
Train |
Contact |
Citation |
License
Introduction
FlagEmbedding is a robust tool designed to map text into low-dimensional dense vectors, allowing for effortless tasks such as retrieval, classification, clustering, and semantic search. It’s particularly useful when integrated with vector databases for large language models (LLMs).
Updates
Here are some recent updates for FlagEmbedding:
- 10122023: Release of LLM-Embedder, a unified embedding model tailored for diverse retrieval augmentation needs.
- 09152023: Technical report of BGE released alongside massive training data.
- 09122023: New models featuring cross-encoder reranker models for improved accuracy.
How to Use FlagEmbedding
To leverage FlagEmbedding, you will first need to install the library and then follow the methods appropriate for your use case. Let’s break down the process with an analogy:
Understanding FlagEmbedding with an Analogy
Imagine you are a librarian, and the books in your library all represent pieces of information. FlagEmbedding is like a magical categorization system that organizes these books based on their content. Instead of merely having books scattered on shelves, this system assigns each book a unique identification number (the low-dimensional dense vector) that holds information about the book’s content and topic. Whenever a reader (user) approaches with a question (query), this magical system quickly finds the most relevant books efficiently, classifying and clustering them according to their similarities. Just like a well-organized library makes finding information easier, FlagEmbedding streamlines the process of retrieval and classification for texts.
Usage
Here’s how you can use FlagEmbedding:
Using FlagEmbedding Library
pip install -U FlagEmbedding
Once installed, you can import and utilize it like so:
from FlagEmbedding import FlagModel
sentences_1 = ["sample data 1", "sample data 2"]
sentences_2 = ["sample data 3", "sample data 4"]
model = FlagModel("BAAIbge-large-zh-v1.5", query_instruction_for_retrieval="Generate representation for searching related articles:", use_fp16=True)
embeddings_1 = model.encode(sentences_1)
embeddings_2 = model.encode(sentences_2)
similarity = embeddings_1 @ embeddings_2.T
print(similarity)
Using with Sentence-Transformers
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("BAAIbge-large-zh-v1.5")
embeddings_1 = model.encode(sentences_1, normalize_embeddings=True)
embeddings_2 = model.encode(sentences_2, normalize_embeddings=True)
similarity = embeddings_1 @ embeddings_2.T
print(similarity)
Using with Langchain
from langchain.embeddings import HuggingFaceBgeEmbeddings
model = HuggingFaceBgeEmbeddings(model_name="BAAIbge-large-en-v1.5", model_kwargs={"device": "cuda"}, encode_kwargs={"normalize_embeddings": True})
Troubleshooting
If you encounter any issues while using FlagEmbedding, here are some troubleshooting tips:
- Ensure that you have the latest version installed by running pip install -U FlagEmbedding.
- Check your GPU availability if you’re using GPU for computation. You can set available GPUs using
os.environ["CUDA_VISIBLE_DEVICES"]
. - If a model fails to load, make sure you are connected to the internet, or consider downloading the models from this link.
- Review the installation instructions and check the official GitHub repository for any updates or new methods.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.
License
FlagEmbedding is licensed under the MIT License. The released models can be used for commercial purposes free of charge.