How to Utilize DeezyMatch: A Flexible Deep Neural Network Approach to Fuzzy String Matching

Nov 29, 2020 | Data Science

homemayankDocumentsarticle-generation-using-llmresized_images_gitnatural_language_processingreadme_Living-with-machines_DeezyMatch

Are you looking to implement fuzzy string matching in your projects? DeezyMatch is a powerful tool designed to assist with various tasks such as candidate ranking, query expansion, and toponym matching. This blog will guide you through the installation, setup, and operation of DeezyMatch, ensuring you are well-equipped to harness its capabilities.

Installation and Setup

To get started, it is strongly recommended to install DeezyMatch via Anaconda. Simply follow these steps:

Create a new environment for DeezyMatch:

conda create -n py39deezy python=3.9

Activate the environment:

conda activate py39deezy

Install DeezyMatch via PyPi (recommended):

pip install DeezyMatch

If you wish to install from the source code, you can do so by cloning the repository and installing dependencies:

git clone https://github.com/Living-with-machines/DeezyMatch.git
cd path/to/myDeezyMatch
pip install -r requirements.txt

Understanding the Code with an Analogy

Imagine you’re building a new toy (your application) and need a toolkit (DeezyMatch) with all the essential tools (commands and functions) to make it. Each tool allows you to perform specific tasks:

Train a New Model: Like a sculptor forming a statue, you can train a new model tailored to your specific requirements using:

from DeezyMatch import train as dm_train
dm_train(input_file_path='.inputs/input_dfm.yaml',
          dataset_path='dataset/dataset-string-matching_train.txt',
          model_name='test001')

Fine-tune a Pretrained Model: Just like adjusting a recipe, you can fine-tune a pretrained model for better performance:

from DeezyMatch import finetune as dm_finetune
dm_finetune(input_file_path='.inputs/input_dfm.yaml',
             dataset_path='dataset/dataset-string-matching_finetune.txt',
             model_name='finetuned_test001',
             pretrained_model_path='.models/test001/test001.model',
             pretrained_vocab_path='.models/test001/test001.vocab')

Model Inference: Similar to using a toy after assembling it, model inference allows you to make predictions:

from DeezyMatch import inference as dm_inference
dm_inference(input_file_path='.inputs/input_dfm.yaml',
             dataset_path='dataset/dataset-string-matching_test.txt',
             pretrained_model_path='.models/finetuned_test001/finetuned_test001.model',
             pretrained_vocab_path='.models/finetuned_test001/finetuned_test001.vocab')

Candidate Ranking: This is akin to sorting toys by size or type, where DeezyMatch ranks candidates based on input queries:

from DeezyMatch import candidate_ranker
candidates_pd = candidate_ranker(query_scenario='.combinedqueries_test',
                                 candidate_scenario='.combinedcandidates_test',
                                 ranking_metric='faiss',
                                 selection_threshold=5.,
                                 num_candidates=2,
                                 search_size=2,
                                 output_path='ranker_results/test_candidates_deezymatch',
                                 pretrained_model_path='.models/finetuned_test001/finetuned_test001.model',
                                 pretrained_vocab_path='.models/finetuned_test001/finetuned_test001.vocab',
                                 number_test_rows=20)

Troubleshooting

If you encounter issues, such as a ModuleNotFoundError, when running scripts, try reinstalling the necessary packages or consulting the relevant GitHub repository. For example, if you get an error with candidateRanker.py, you can solve it with:

pip install faiss-cpu --no-cache

Refer to additional troubleshooting resources on GitHub.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox

How to Utilize DeezyMatch: A Flexible Deep Neural Network Approach to Fuzzy String Matching

Installation and Setup

Understanding the Code with an Analogy

Troubleshooting

Conclusion

Let’s Build Success Together