How to Get Started with the MicrosoftPhi-2 MongoDB Query Model

May 1, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_9_227

Welcome to a journey into the realm of natural language processing, where we’ll explore how to leverage the microsoftphi-2 model to generate MongoDB queries effortlessly. In this guide, we’ll walk through everything you need to know about setting up and using this powerful fine-tuned model created by Chirayu Tripathi.

Understanding the Model

The model you’re about to interact with, referred to as phi-2-mongodb, is a tailored version of microsoftphi-2. It has been deftly fine-tuned to transform plain English instructions into MongoDB queries based on a specially curated dataset. Think of it as a translator—one that takes your everyday language and turns it into a precise language that MongoDB understands.

Model Details

Fine-tuned by: Chirayu Tripathi
Developed by: Microsoft
Language: English
License: MIT
Finetuned from model: microsoftphi-2

Getting Started with the Model

To jumpstart your experience with this model, follow the detailed steps below:

python
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    BitsAndBytesConfig,
)
import torch
from peft import PeftModel

# MongoDB Schema Definition
db_schema = {
    "collections": [{
        "name": "shipwrecks",
        "indexes": [{
            "key": {"_id": 1},
            "key": {"feature_type": 1},
            "key": {"chart": 1},
            "key": {"latdec": 1, "londec": 1}
        }],
        "uniqueIndexes": [],
        "document": {
            "properties": {
                "_id": {"bsonType": "string"},
                ...
            }
        },
        "version": 1
    }]
}

text = "Find the count of shipwrecks for each unique combination of latdec and londec"
prompt = "Your task is to create a MongoDB query..."

# Device Configuration
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load Base Model
base_model_id = "microsoftphi-2"
tokenizer = AutoTokenizer.from_pretrained(base_model_id, use_fast=True)
compute_dtype = getattr(torch, "float16")

# Configure Quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    ...
)

model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    ...
)

adapter = "Chirayuphi-2-mongodb"
model = PeftModel.from_pretrained(model, adapter).to(device)

# Prompt Handling
model_inputs = tokenizer(prompt, return_tensors="pt").to(device)
output = model.generate(
    **model_inputs,
    ...
)

# Decode the Output
query = tokenizer.decode(output[model_inputs['input_ids'].shape[1]:], skip_special_tokens=False)
print(query.strip())

Decoding the Code: An Analogy

Let’s break down the example code using a simple analogy involving a coffee shop:

Base Model (MicrosoftPhi-2): This is like the coffee blend you choose. It forms the foundation of everything that follows.
Tokenization (AutoTokenizer): Think of this as grinding coffee beans. It’s essential to prepare (or “tokenize”) the input accurately before brewing your coffee (producing queries).
Device Configuration: This could be compared to choosing the right brewing method—espresso machine or a French press. Each has its advantages depending on the beans used.
Quantization: This acts like choosing the right grind size for your coffee. It optimizes how efficiently your coffee (query) is extracted based on the tools at your disposal.
Generating the Query: Just as you’d wait for your coffee to brew, you wait for the model to generate your output (MongoDB query) based on the input (prompt).
Decoding the Output: Finally, this step is akin to pouring your freshly brewed coffee into a cup. You will have your finished query ready to be served!

Troubleshooting Tips

If things don’t go as planned, here are some troubleshooting ideas:

Model Not Loading: Ensure that all required libraries are installed and check your internet connection for downloading the model.
Incompatible Data Types: Double-check that your input schema matches the expected MongoDB schema.
Device Errors: Confirm that your device is set up correctly for either CPU or CUDA processing.
Unexpected Output: Revisit your prompt to ensure it’s clear and conforms to the model’s requirements.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

As you delve into the world of generating MongoDB queries using the microsoftphi-2 model, you stand at the forefront of bridging natural language with powerful database interactions. At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox