Welcome to the world of Qwen2-Boundless, a powerful fine-tuned model specifically designed for handling a variety of questions, including complex topics related to ethics, legality, and more. In this guide, we will walk you through how to effectively utilize this model, troubleshoot any potential issues you may encounter, and provide you with a creative analogy to simplify the understanding of the code.
Overview of Qwen2-Boundless
Qwen2-Boundless is built on the foundation of Qwen2-1.5B-Instruct and is optimized for interactions in Chinese. It employs advanced mechanisms to engage in conversations while complying with ethical guidelines and local regulations. Always remember that this model serves research and testing purposes exclusively.
Getting Started with Qwen2-Boundless
To utilize Qwen2-Boundless, you will need to load the model using Python. Below is a simple structure for doing so.
python
from transformers import AutoModelForCausalLM, AutoTokenizer
import os
device = cuda # the device to load the model onto
current_directory = os.path.dirname(os.path.abspath(__file__))
model = AutoModelForCausalLM.from_pretrained(
current_directory,
torch_dtype=auto,
device_map=auto
)
tokenizer = AutoTokenizer.from_pretrained(current_directory)
prompt = "Hello?"
messages = [
{"role": "system", "content": ""},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors=pt).to(device)
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
The Magic Behind the Code: An Analogy
Think of using Qwen2-Boundless as navigating a library filled with books (the model) that respond to conversations. When you walk into this library, you first check which books (model and tokenizer) are available by peeking around the shelves (loading the model). Then, you whisper a question (input text) to one of the librarians (the model), who swiftly finds the most relevant information and whispers an answer back to you. This interaction continues as you ask more questions, building a continuous dialogue with the librarian.
Extending Conversations
To enhance the experience and maintain a seamless exchange of ideas, you can set up continuous conversations using the following code:
python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import os
device = cuda # the device to load the model onto
current_directory = os.path.dirname(os.path.abspath(__file__))
model = AutoModelForCausalLM.from_pretrained(
current_directory,
torch_dtype=auto,
device_map=auto
)
tokenizer = AutoTokenizer.from_pretrained(current_directory)
messages = [
{"role": "system", "content": ""}
]
while True:
user_input = input("User: ")
messages.append({"role": "user", "content": user_input})
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors=pt).to(device)
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=512
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(f"Assistant: {response}")
messages.append({"role": "assistant", "content": response})
Handling Streaming Responses
For applications needing to stream responses in real-time, you can implement the following setup:
python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer
from transformers.trainer_utils import set_seed
from threading import Thread
import random
import os
DEFAULT_CKPT_PATH = os.path.dirname(os.path.abspath(__file__))
def _load_model_tokenizer(checkpoint_path, cpu_only):
tokenizer = AutoTokenizer.from_pretrained(checkpoint_path, resume_download=True)
device_map = "cpu" if cpu_only else "auto"
model = AutoModelForCausalLM.from_pretrained(
checkpoint_path,
torch_dtype="auto",
device_map=device_map,
resume_download=True,
).eval()
model.generation_config.max_new_tokens = 512
return model, tokenizer
def _get_input() -> str:
while True:
try:
message = input("User: ").strip()
except UnicodeDecodeError:
print("[ERROR] Encoding error in input")
continue
except KeyboardInterrupt:
exit(1)
if message:
return message
print("[ERROR] Query is empty")
def _chat_stream(model, tokenizer, query, history):
conversation = [
{"role": "system", "content": ""},
]
for query_h, response_h in history:
conversation.append({"role": "user", "content": query_h})
conversation.append({"role": "assistant", "content": response_h})
conversation.append({"role": "user", "content": query})
inputs = tokenizer.apply_chat_template(
conversation,
add_generation_prompt=True,
return_tensors="pt",
)
inputs = inputs.to(model.device)
streamer = TextIteratorStreamer(tokenizer=tokenizer, skip_prompt=True, timeout=60.0, skip_special_tokens=True)
generation_kwargs = dict(input_ids=inputs, streamer=streamer)
thread = Thread(target=model.generate, kwargs=generation_kwargs)
thread.start()
for new_text in streamer:
yield new_text
def main():
checkpoint_path = DEFAULT_CKPT_PATH
seed = random.randint(0, 2**32 - 1)
set_seed(seed)
cpu_only = False
history = []
model, tokenizer = _load_model_tokenizer(checkpoint_path, cpu_only)
while True:
query = _get_input()
print(f"\nUser: {query}")
print("Assistant: ", end="")
try:
partial_text = ""
for new_text in _chat_stream(model, tokenizer, query, history):
print(new_text, end="", flush=True)
partial_text += new_text
print()
history.append((query, partial_text))
except KeyboardInterrupt:
print("Generation interrupted")
continue
if __name__ == "__main__":
main()
Dataset Information
The Qwen2-Boundless model utilizes a specific dataset named bad_data.json, encompassing a diverse spectrum of topics. As the dataset is primarily in Chinese, the model’s proficiency shines when dealing with inputs in that language.
GitHub Repository
For more in-depth information regarding the model and any ongoing updates, you can visit our GitHub repository: GitHub: ystemsrx/Qwen2-Boundless.
Troubleshooting Tips
If you encounter any issues while using the Qwen2-Boundless model, consider the following troubleshooting tips:
- Device Compatibility: Ensure that your hardware supports CUDA and has the required libraries installed.
- Input Errors: If you receive encoding issues, double-check the input format to avoid Unicode errors.
- Empty Queries: Always ensure that your input is not blank, as the model will not respond to empty queries.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.