How to Effectively Use OneKE for Knowledge Extraction

May 6, 2024 | Educational

In the vast ocean of data, OneKE (One Knowledge Extraction) emerges as a sophisticated vessel designed to navigate through bilingual information, extracting relevant insights seamlessly. Developed by Ant Group and Zhejiang University, OneKE combines the power of large language models with a structured approach to knowledge extraction.

What is OneKE?

OneKE acts as a bridge, linking unstructured data to structured knowledge, enabling applications across diverse fields such as healthcare, finance, and public administration. With its bilingual capabilities, it can extract knowledge from both English and Chinese sources, significantly enhancing the creation of knowledge graphs while addressing challenges such as data fragmentation and ambiguity.

How is OneKE Trained?

The training process of OneKE involves meticulous data handling—normalization, cleaning, and the strategic collection of negative samples. Think of it like preparing a gourmet dish: the chefs (data scientists) must carefully select and prepare their ingredients (data) to achieve the perfect flavor (result). This leads to schema-generalizable extraction, allowing OneKE to operate efficiently across varying datasets.

Getting Started with OneKE

Quick Start

Before diving into OneKE, ensure your hardware is adequately equipped with at least **20GB of VRAM** for effective training and inference.

import torch
from transformers import (    
    AutoConfig,    
    AutoTokenizer,    
    AutoModelForCausalLM,    
    GenerationConfig,    
    BitsAndBytesConfig
)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_path = 'zjunlp/OneKE'

config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# 4-bit Quantized OneKE
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    llm_int8_threshold=6.0,
    llm_int8_has_fp16_weight=False,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    config=config,
    device_map="auto",      
    quantization_config=quantization_config,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)

model.eval()
system_prompt = '<>\nYou are a helpful assistant. 你是一个乐于助人的助手。\n<>\n\n'
sintruct = "{\"instruction\": \"You are an expert in named entity recognition. Please extract entities that match the schema definition from the input. Return an empty list if the entity type does not exist. Please respond in the format of a JSON string.\", \"schema\": [\"person\", \"organization\", \"else\", \"location\"], \"input\": \"284 Robert Allenby ( Australia ) 69 71 71 73 , Miguel Angel Martin ( Spain ) 75 70 71 68 ( Allenby won at first play-off hole )\"}"
sintruct = '[INST] ' + system_prompt + sintruct + '[/INST]'

input_ids = tokenizer.encode(sintruct, return_tensors="pt").to(device)
input_length = input_ids.size(1)
generation_output = model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_length=1024, max_new_tokens=512, return_dict_in_generate=True))
generation_output = generation_output.sequences[0]
generation_output = generation_output[input_length:]
output = tokenizer.decode(generation_output, skip_special_tokens=True)
print(output)

The code snippet above is your treasure map, guiding you to unearth valuable insights through OneKE. The `input` section represents the unstructured data from which you want to extract information, akin to tracing your way through a dense forest to find hidden treasure.

Advanced Use of OneKE

Once you’re comfortable with the basics, explore the advanced functionalities to harness the full potential of OneKE. The instruction format resembles JSON and is tailored to the tasks at hand, whether it’s Named Entity Recognition (NER), Relation Extraction (RE), or Event Extraction (EE).

OneKE Instruction Format and Examples

  • Named Entity Recognition (NER)
  • Example
    {
                "instruction": "You are an expert specializing in entity extraction. Please extract entities that comply with the schema definition from the input; return an empty list for non-existent entity types. Please respond in JSON string format.",
                "schema": ["Person Name", "Education", "Position", "Nationality"],
                "input": "Mr. Liu Zhijian: Born in 1956, Chinese nationality, no permanent residency abroad, member of the Communist Party, associate degree, senior economist."
    }
  • Relation Extraction (RE)
  • Example
    {
                "instruction": "You are an expert specializing in relation extraction. Please extract relationship triples that comply with the schema definition from the input; return an empty list for non-existent relationships. Please respond in JSON string format.",
                "schema": ["Father", "Husband", "Postal Code", "Mother"],
                "input": "Ding Long took out his life savings of $12,000..."
    }
  • Event Extraction (EE)
  • Example
    {
                "instruction": "You are an expert specializing in event extraction. Please extract events that match the defined schema from the input.",
                "schema": [{"event_type": "Finance/Trading - Interest Rate Hike", "trigger": true, "arguments": ["Time"]}],
                "input": "AI risk control solution provider Vezetech secures tens of millions of dollars in Series C+ funding."
    }

Troubleshooting Tips

  • If you encounter issues with model loading, check your specified model path and ensure proper internet connectivity.
  • For performance hiccups, make sure your hardware meets the requirements, especially concerning VRAM.
  • In case of unclear outputs, experiment with the input examples and instruction formats to refine your queries.
  • For further assistance, visit the OneKE Documentation or engage with the community at GitHub for shared experiences and solutions.
  • For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

Utilizing OneKE effectively can lead to groundbreaking advancements in data extraction, transforming how information is interpreted and understood in various domains. At fxis.ai, we believe that such advancements are crucial for the future of AI as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox