How to Efficiently Fine-tune Quantized LLMs for Finance with FIN-LLAMA

Aug 11, 2023 | Educational

In the world of finance, understanding and predicting market trends is paramount. Thanks to advancements in AI, particularly in the realm of Large Language Models (LLMs), you can leverage these tools for various applications in finance. In this guide, we’ll walk through how to efficiently fine-tune quantized LLMs using the FIN-LLAMA model. Whether you’re looking to assess market values or strategize for financial growth, this article has you covered!

Installation Steps

Before you dive into using the FIN-LLAMA model, you’ll need to set up your environment. The following steps are essential for loading models in 4 bits using the Transformers library and the Bitsandbytes library.

Make sure you have accelerate and transformers installed from source.
Ensure you’re using the latest version of bitsandbytes library (0.39.0).

Here’s how you can install the necessary components:

pip3 install -r requirements.txt
pip3 install -q -U bitsandbytes
pip3 install -q -U git+https://github.com/huggingface/transformers.git
pip3 install -q -U git+https://github.com/huggingface/peft.git
pip3 install -q -U git+https://github.com/huggingface/accelerate.git

Setting Up for Fine-tuning

If you intend to fine-tune the model on a new instance, running the setup script is crucial. Proceed with the following command:

bash scripts/setup.sh

Fine-tuning Process

For fine-tuning, you’ll need to use the provided script:

bash finetune.sh

Usage of the Model

The usage of the model is determined by specific quantization parameters. These parameters manage how the model loads and processes data. Think of these as various tools in a mechanic’s toolbox:

load_in_4bit: This sets the model to load in 4 bits.
bnb_4bit_compute_dtype: Defines the data type used for linear layer computations.
bnb_4bit_use_double_quant: Activates nested quantization.
bnb_4bit_quant_type: Specifies the datatype for quantization, where fp4 and nf4 are two options available.

Now you can write and run your Python code to utilize the model:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

pretrained_model_name_or_path = "bavestfin-llama-33b-merge"
model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path=pretrained_model_name_or_path,
    load_in_4bit=True,
    device_map='auto',
    torch_dtype=torch.bfloat16,
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_compute_dtype=torch.bfloat16,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type='nf4'
    ),
)
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path)
question = "What is the market cap of apple?"
input = " # context if needed"
prompt = f"A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the users question."

input_ids = tokenizer.encode(prompt, return_tensors='pt').to('cuda:0')
with torch.no_grad():
    generated_ids = model.generate(
        input_ids,
        do_sample=True,
        top_p=0.9,
        temperature=0.8,
        max_length=128
    )
generated_text = tokenizer.decode(
    [el.item() for el in generated_ids[0]], skip_special_tokens=True
)

Dataset for FIN-LLAMA

The dataset for training the FIN-LLAMA models is available under the bavestfin-llama-dataset.

Troubleshooting Known Issues

While working with FIN-LLAMA, you might encounter some challenges. Here are a few known issues and potential fixes:

4-bit inference can sometimes be slow. Currently, the 4-bit implementation may not fully leverage 4-bit matrix multiplication.
Using bnb_4bit_compute_type=fp16 can lead to instabilities. It is recommended to stick to bfloat16.
Ensure that tokenizer.bos_token_id = 1 to avoid generation issues.

If you face any additional issues not listed here, feel free to report them or check for help on the community forums or the QLORA GitHub page.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox