How to Use Qwen2-Boundless: A Comprehensive Guide

Aug 20, 2024 | Educational

homemayankDocumentsarticle-generation-using-llmresized_imagesreadme_16_266

Welcome to the world of Qwen2-Boundless, a powerful fine-tuned model specifically designed for handling a variety of questions, including complex topics related to ethics, legality, and more. In this guide, we will walk you through how to effectively utilize this model, troubleshoot any potential issues you may encounter, and provide you with a creative analogy to simplify the understanding of the code.

Overview of Qwen2-Boundless

Qwen2-Boundless is built on the foundation of Qwen2-1.5B-Instruct and is optimized for interactions in Chinese. It employs advanced mechanisms to engage in conversations while complying with ethical guidelines and local regulations. Always remember that this model serves research and testing purposes exclusively.

Getting Started with Qwen2-Boundless

To utilize Qwen2-Boundless, you will need to load the model using Python. Below is a simple structure for doing so.

python
from transformers import AutoModelForCausalLM, AutoTokenizer
import os

device = cuda  # the device to load the model onto
current_directory = os.path.dirname(os.path.abspath(__file__))

model = AutoModelForCausalLM.from_pretrained(
    current_directory,
    torch_dtype=auto,
    device_map=auto
)
tokenizer = AutoTokenizer.from_pretrained(current_directory)

prompt = "Hello?"
messages = [
    {"role": "system", "content": ""},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors=pt).to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)

generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

The Magic Behind the Code: An Analogy

Think of using Qwen2-Boundless as navigating a library filled with books (the model) that respond to conversations. When you walk into this library, you first check which books (model and tokenizer) are available by peeking around the shelves (loading the model). Then, you whisper a question (input text) to one of the librarians (the model), who swiftly finds the most relevant information and whispers an answer back to you. This interaction continues as you ask more questions, building a continuous dialogue with the librarian.

Extending Conversations

To enhance the experience and maintain a seamless exchange of ideas, you can set up continuous conversations using the following code:

python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import os

device = cuda  # the device to load the model onto
current_directory = os.path.dirname(os.path.abspath(__file__))

model = AutoModelForCausalLM.from_pretrained(
    current_directory,
    torch_dtype=auto,
    device_map=auto
)
tokenizer = AutoTokenizer.from_pretrained(current_directory)

messages = [
    {"role": "system", "content": ""}
]

while True:
    user_input = input("User: ")
    messages.append({"role": "user", "content": user_input})

    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    model_inputs = tokenizer([text], return_tensors=pt).to(device)
    generated_ids = model.generate(
        model_inputs.input_ids,
        max_new_tokens=512
    )

    generated_ids = [
        output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
    ]

    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
    print(f"Assistant: {response}")

    messages.append({"role": "assistant", "content": response})

Handling Streaming Responses

For applications needing to stream responses in real-time, you can implement the following setup:

python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer
from transformers.trainer_utils import set_seed
from threading import Thread
import random
import os

DEFAULT_CKPT_PATH = os.path.dirname(os.path.abspath(__file__))

def _load_model_tokenizer(checkpoint_path, cpu_only):
    tokenizer = AutoTokenizer.from_pretrained(checkpoint_path, resume_download=True)
    device_map = "cpu" if cpu_only else "auto"
    model = AutoModelForCausalLM.from_pretrained(
        checkpoint_path,
        torch_dtype="auto",
        device_map=device_map,
        resume_download=True,
    ).eval()
    model.generation_config.max_new_tokens = 512
    return model, tokenizer

def _get_input() -> str:
    while True:
        try:
            message = input("User: ").strip()
        except UnicodeDecodeError:
            print("[ERROR] Encoding error in input")
            continue
        except KeyboardInterrupt:
            exit(1)
        if message:
            return message
        print("[ERROR] Query is empty")

def _chat_stream(model, tokenizer, query, history):
    conversation = [
        {"role": "system", "content": ""},
    ]
    
    for query_h, response_h in history:
        conversation.append({"role": "user", "content": query_h})
        conversation.append({"role": "assistant", "content": response_h})

    conversation.append({"role": "user", "content": query})
    inputs = tokenizer.apply_chat_template(
        conversation,
        add_generation_prompt=True,
        return_tensors="pt",
    )
    inputs = inputs.to(model.device)
    streamer = TextIteratorStreamer(tokenizer=tokenizer, skip_prompt=True, timeout=60.0, skip_special_tokens=True)
    generation_kwargs = dict(input_ids=inputs, streamer=streamer)
    thread = Thread(target=model.generate, kwargs=generation_kwargs)
    thread.start()
    for new_text in streamer:
        yield new_text

def main():
    checkpoint_path = DEFAULT_CKPT_PATH
    seed = random.randint(0, 2**32 - 1)
    set_seed(seed)
    cpu_only = False
    history = []
    model, tokenizer = _load_model_tokenizer(checkpoint_path, cpu_only)
    while True:
        query = _get_input()
        print(f"\nUser: {query}")
        print("Assistant: ", end="")
        try:
            partial_text = ""
            for new_text in _chat_stream(model, tokenizer, query, history):
                print(new_text, end="", flush=True)
                partial_text += new_text
            print()
            history.append((query, partial_text))
        except KeyboardInterrupt:
            print("Generation interrupted")
            continue

if __name__ == "__main__":
    main()

Dataset Information

The Qwen2-Boundless model utilizes a specific dataset named bad_data.json, encompassing a diverse spectrum of topics. As the dataset is primarily in Chinese, the model’s proficiency shines when dealing with inputs in that language.

GitHub Repository

For more in-depth information regarding the model and any ongoing updates, you can visit our GitHub repository: GitHub: ystemsrx/Qwen2-Boundless.

Troubleshooting Tips

If you encounter any issues while using the Qwen2-Boundless model, consider the following troubleshooting tips:

Device Compatibility: Ensure that your hardware supports CUDA and has the required libraries installed.
Input Errors: If you receive encoding issues, double-check the input format to avoid Unicode errors.
Empty Queries: Always ensure that your input is not blank, as the model will not respond to empty queries.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox