Dmeta-embedding

Apr 9, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_25_168

Introduction

Dmeta-embedding is a cross-domain, out-of-the-box model designed for various applications such as search engines, question answering (QA), intelligent customer service, and more. It excels in embedding tasks and ranks second on the MTEB Chinese leaderboard.

How to Use Dmeta-embedding

The Dmeta-embedding model can be easily integrated into your projects using popular frameworks like Sentence-Transformers, Langchain, and Hugging Face Transformers. Here’s a step-by-step guide on how to implement it with these frameworks:

1. Using Sentence-Transformers

To load and perform inference using Dmeta-embedding via Sentence-Transformers, follow these steps:

pip install -U sentence-transformers

from sentence_transformers import SentenceTransformer

texts1 = ["example text 1", "example text 2"]
texts2 = ["another example text 1", "another example text 2", "another example text 3"]

model = SentenceTransformer('DMetaSoul/dmeta-embedding')
embs1 = model.encode(texts1, normalize_embeddings=True)
embs2 = model.encode(texts2, normalize_embeddings=True)

similarity = embs1 @ embs2.T
print(similarity)

for i in range(len(texts1)):
    scores = []
    for j in range(len(texts2)):
        scores.append([texts2[j], similarity[i][j]])
    scores = sorted(scores, key=lambda x: x[1], reverse=True)
    print(f"{texts1[i]}: {scores}")

2. Using Langchain

To integrate with Langchain, execute the following:

pip install -U langchain

import torch
import numpy as np
from langchain.embeddings import HuggingFaceEmbeddings

model_name = 'DMetaSoul/dmeta-embedding'
model_kwargs = {"device": "cuda" if torch.cuda.is_available() else "cpu"}
encode_kwargs = {"normalize_embeddings": True}

model = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs, encode_kwargs=encode_kwargs)

texts1 = ["example text 1", "example text 2"]
texts2 = ["another example text 1", "another example text 2", "another example text 3"]

embs1 = model.embed_documents(texts1)
embs2 = model.embed_documents(texts2)

embs1, embs2 = np.array(embs1), np.array(embs2)
similarity = embs1 @ embs2.T
print(similarity)

for i in range(len(texts1)):
    scores = []
    for j in range(len(texts2)):
        scores.append([texts2[j], similarity[i][j]])
    scores = sorted(scores, key=lambda x: x[1], reverse=True)
    print(f"{texts1[i]}: {scores}")

3. Using Hugging Face Transformers

This method involves more detailed control over embeddings:

pip install -U transformers

import torch
from transformers import AutoTokenizer, AutoModel

def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0]
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)

texts1 = ["example text 1", "example text 2"]
texts2 = ["another example text 1", "another example text 2", "another example text 3"]

tokenizer = AutoTokenizer.from_pretrained('DMetaSoul/dmeta-embedding')
model = AutoModel.from_pretrained('DMetaSoul/dmeta-embedding')
model.eval()

with torch.no_grad():
    inputs1 = tokenizer(texts1, padding=True, truncation=True, return_tensors='pt')
    inputs2 = tokenizer(texts2, padding=True, truncation=True, return_tensors='pt')
    model_output1 = model(**inputs1)
    model_output2 = model(**inputs2)
    embs1 = mean_pooling(model_output1, inputs1['attention_mask'])
    embs2 = mean_pooling(model_output2, inputs2['attention_mask'])

similarity = embs1 @ embs2.T
print(similarity)

for i in range(len(texts1)):
    scores = []
    for j in range(len(texts2)):
        scores.append([texts2[j], similarity[i][j]])
    scores = sorted(scores, key=lambda x: x[1], reverse=True)
    print(f"{texts1[i]}: {scores}")

Understanding Dmeta-embedding

Dmeta-embedding is like a Swiss army knife for languages—useful for various tasks ranging from search engines to QA systems. Imagine you’re preparing a buffet where each dish represents a different type of data. Instead of using one tool for each dish, a Swiss army knife allows you to switch between tools quickly and effectively. Dmeta-embedding removes the complexity of switching between various data types and formats by offering a unified solution that adapts to your needs.

Troubleshooting

If you encounter issues while using Dmeta-embedding, consider the following troubleshooting tips:

Ensure that your environment is properly set up with the necessary libraries installed.
Check for compatibility with the framework versions you are using (e.g., Sentence-Transformers, Langchain).
Verify the inputs you’re feeding into the model—incorrect formats can lead to errors.

If problems persist, feel free to reach out via discussion forum or email support. For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox