How to Translate SMILES to IUPAC with SMILES2IUPAC-canonical-base

Feb 15, 2024 | Educational

Understanding chemical structures can be daunting, especially when faced with different nomenclature systems like SMILES (Simplified Molecular Input Line Entry System) and IUPAC (International Union of Pure and Applied Chemistry). But fear not! The SMILES2IUPAC-canonical-base model is here to bridge that gap, effectively translating SMILES chemical names into the IUPAC standards.

What is SMILES2IUPAC-canonical-base?

This model is based on the MT5 architecture with enhancements for using distinct tokenizers for the encoder and decoder. It aims to provide accurate translations with remarkable efficiency.

Features of the Model

  • Developed by: Knowladgator Engineering
  • Model Type: Encoder-Decoder with attention mechanism
  • Supported Languages: SMILES, IUPAC (English)
  • License: Apache License 2.0

Quickstart Guide

To get started, you’ll first need to install the required library. Open your command line interface and type:

pip install chemical-converters

Using the Model

Once installed, you can use the model to translate SMILES to IUPAC in a few simple steps.

Basic Translation Example

Here’s how to perform a straightforward translation:

from chemicalconverters import NamesConverter
converter = NamesConverter(model_name="knowledgator/SMILES2IUPAC-canonical-base")
print(converter.smiles_to_iupac("CCO"))  # Output: ethanol
print(converter.smiles_to_iupac(["SYSTCCO", "TRADCCO", "BASECCO"]))  # Outputs: [ethanol, ethanol, ethanol]

Just like translating a simple sentence, where each word takes on its correct meaning based on context, the model examines the structure of the SMILES notation “CCO” and translates it to “ethanol” efficiently.

Processing in Batches

If you have multiple SMILES inputs, you can process them in batches:

from chemicalconverters import NamesConverter
converter = NamesConverter(model_name="knowledgator/SMILES2IUPAC-canonical-base")
print(converter.smiles_to_iupac(["BASEC=CC=C" for _ in range(10)], num_beams=1, process_in_batch=True, batch_size=1000))

This handles multiple requests simultaneously, just as a chef prepares a whole catering order at once instead of cooking one dish at a time.

Validating Translations

To ensure accuracy, you can validate translations by reverse translating back to SMILES and comparing fingerprints:

from chemicalconverters import NamesConverter
converter = NamesConverter(model_name="knowledgator/SMILES2IUPAC-canonical-base")
print(converter.smiles_to_iupac("CCO", validate=True))  # Output: ethanol
# Validating manually
from chemicalconverters import NamesConverter
validation_model = NamesConverter(model_name="knowledgator/IUPAC2SMILES-canonical-base")
print(NamesConverter.validate_iupac(input_sequence="CCO", predicted_sequence="CCO", validation_model=validation_model))

The Tanimoto similarity score indicates how closely the predictions match the expected title, ensuring your “recipe” translates accurately across name systems.

Troubleshooting

While using the SMILES2IUPAC-canonical-base model, you may run into some common issues:

  • Limited accuracy with large molecules: The model may not effectively process more complex structures.
  • Isomeric and isotopic support: Currently, these are not supported, so be cautious when inputting such SMILES.
  • Performance issues: If using larger datasets, ensure your system has adequate resources to handle the batch processing.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

With the SMILES2IUPAC-canonical-base model, translating chemical names has never been easier! Remember, using the right style tokens can enhance your results, and validating translations is a great way to ensure accuracy.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox