Are you looking to implement fuzzy string matching in your projects? DeezyMatch is a powerful tool designed to assist with various tasks such as candidate ranking, query expansion, and toponym matching. This blog will guide you through the installation, setup, and operation of DeezyMatch, ensuring you are well-equipped to harness its capabilities.
Installation and Setup
To get started, it is strongly recommended to install DeezyMatch via Anaconda. Simply follow these steps:
- Create a new environment for DeezyMatch:
conda create -n py39deezy python=3.9
conda activate py39deezy
pip install DeezyMatch
If you wish to install from the source code, you can do so by cloning the repository and installing dependencies:
git clone https://github.com/Living-with-machines/DeezyMatch.git
cd path/to/myDeezyMatch
pip install -r requirements.txt
Understanding the Code with an Analogy
Imagine you’re building a new toy (your application) and need a toolkit (DeezyMatch) with all the essential tools (commands and functions) to make it. Each tool allows you to perform specific tasks:
- Train a New Model: Like a sculptor forming a statue, you can train a new model tailored to your specific requirements using:
from DeezyMatch import train as dm_train
dm_train(input_file_path='.inputs/input_dfm.yaml',
dataset_path='dataset/dataset-string-matching_train.txt',
model_name='test001')
from DeezyMatch import finetune as dm_finetune
dm_finetune(input_file_path='.inputs/input_dfm.yaml',
dataset_path='dataset/dataset-string-matching_finetune.txt',
model_name='finetuned_test001',
pretrained_model_path='.models/test001/test001.model',
pretrained_vocab_path='.models/test001/test001.vocab')
from DeezyMatch import inference as dm_inference
dm_inference(input_file_path='.inputs/input_dfm.yaml',
dataset_path='dataset/dataset-string-matching_test.txt',
pretrained_model_path='.models/finetuned_test001/finetuned_test001.model',
pretrained_vocab_path='.models/finetuned_test001/finetuned_test001.vocab')
from DeezyMatch import candidate_ranker
candidates_pd = candidate_ranker(query_scenario='.combinedqueries_test',
candidate_scenario='.combinedcandidates_test',
ranking_metric='faiss',
selection_threshold=5.,
num_candidates=2,
search_size=2,
output_path='ranker_results/test_candidates_deezymatch',
pretrained_model_path='.models/finetuned_test001/finetuned_test001.model',
pretrained_vocab_path='.models/finetuned_test001/finetuned_test001.vocab',
number_test_rows=20)
Troubleshooting
If you encounter issues, such as a ModuleNotFoundError
, when running scripts, try reinstalling the necessary packages or consulting the relevant GitHub repository. For example, if you get an error with candidateRanker.py
, you can solve it with:
pip install faiss-cpu --no-cache
Refer to additional troubleshooting resources on GitHub.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
Conclusion
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.