Awesome Biomolecule-Language Cross Modeling

Apr 30, 2024 | Data Science

homemayankDocumentsarticle-generation-using-llmresized_images_gitnatural_language_processingreadme_QizhiPei_Awesome-Biomolecule-Language-Cross-Modeling-1

[![](https://img.shields.io/badge/paper-arxiv:2403.01528-red?style=plastic&logo=GitBook)](https://arxiv.org/abs/2403.01528)
[![Awesome](https://awesome.rebadge.svg)](https://awesome.re)
![Stars](https://img.shields.io/github/stars/QizhiPei/Awesome-Biomolecule-Language-Cross-Modeling?color=yellow&label=Stars&labelColor=555555)
![Forks](https://img.shields.io/github/forks/QizhiPei/Awesome-Biomolecule-Language-Cross-Modeling?color=blue&label=Fork&labelColor=555555)

The repository for Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey, including related models, datasets benchmarks, and other resource links. We will keep this repository updated. If you have a paper or resource you’d like to add, feel free to submit a pull request, open an issue, or email the author at qizhipei@ruc.edu.cn.

Models
Datasets & Benchmarks
Related Resources
Acknowledgements

Models

About the models, think of them as the superheroes of the biomolecule and natural language realm. Each model is like a superhero with a unique set of powers:

BioBERT: The text-mining savior for biomedicine.
SciBERT: Powerful enough to tackle scientific papers.
ClinicalBERT: The clinical notes wizard, keeping hospital readmissions in check.
GatorTron: The overarching hero helping to unlock patient information.

Imagine a superhero team where each character specializes in a different ability, similar to how these models function in their respective domains. Using strengths from each model allows researchers to tackle complex problems across multiple disciplines.

Datasets & Benchmarks

The backbone of training these models is a rich set of datasets. Here’s a glance at some key datasets you can use:

Dataset	Usage	Modality	Link
PubMed	Pre-training	Text	Link
ZINC	Pre-training	Molecule	Link, Link
UniProt	Pre-training	Protein	Link

Related Surveys & Evaluations

Acknowledgements

This repository is contributed and updated by QizhiPei and Lijun Wu. If you have questions, don’t hesitate to open an issue or ask me via qizhipei@ruc.edu.cn or Lijun Wu via lijun_wu@outlook.com. We are happy to hear from you!

Troubleshooting Ideas

If you encounter any issues while exploring the models, here are a few troubleshooting tips:

Make sure you have the required libraries installed for the models.
Check the documentation for each model to ensure correct usage.
If models aren’t performing as expected, consider adjusting the dataset or finetuning the models.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox

Awesome Biomolecule-Language Cross Modeling

Table of Contents

Models

Datasets & Benchmarks

Related Resources

Related Surveys & Evaluations

Acknowledgements

Troubleshooting Ideas

Let’s Build Success Together