Awesome Biomolecule-Language Cross Modeling

Apr 30, 2024 | Data Science

[![](https://img.shields.io/badge/paper-arxiv:2403.01528-red?style=plastic&logo=GitBook)](https://arxiv.org/abs/2403.01528)
[![Awesome](https://awesome.rebadge.svg)](https://awesome.re)
![Stars](https://img.shields.io/github/stars/QizhiPei/Awesome-Biomolecule-Language-Cross-Modeling?color=yellow&label=Stars&labelColor=555555)
![Forks](https://img.shields.io/github/forks/QizhiPei/Awesome-Biomolecule-Language-Cross-Modeling?color=blue&label=Fork&labelColor=555555)

The repository for Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey, including related models, datasets benchmarks, and other resource links. We will keep this repository updated. If you have a paper or resource you’d like to add, feel free to submit a pull request, open an issue, or email the author at qizhipei@ruc.edu.cn.

Table of Contents

Models

About the models, think of them as the superheroes of the biomolecule and natural language realm. Each model is like a superhero with a unique set of powers:

  • BioBERT: The text-mining savior for biomedicine.
  • SciBERT: Powerful enough to tackle scientific papers.
  • ClinicalBERT: The clinical notes wizard, keeping hospital readmissions in check.
  • GatorTron: The overarching hero helping to unlock patient information.

Imagine a superhero team where each character specializes in a different ability, similar to how these models function in their respective domains. Using strengths from each model allows researchers to tackle complex problems across multiple disciplines.

Datasets & Benchmarks

The backbone of training these models is a rich set of datasets. Here’s a glance at some key datasets you can use:

Dataset Usage Modality Link
PubMed Pre-training Text Link
ZINC Pre-training Molecule Link, Link
UniProt Pre-training Protein Link

Related Surveys & Evaluations

Acknowledgements

This repository is contributed and updated by QizhiPei and Lijun Wu. If you have questions, don’t hesitate to open an issue or ask me via qizhipei@ruc.edu.cn or Lijun Wu via lijun_wu@outlook.com. We are happy to hear from you!

Troubleshooting Ideas

If you encounter any issues while exploring the models, here are a few troubleshooting tips:

  • Make sure you have the required libraries installed for the models.
  • Check the documentation for each model to ensure correct usage.
  • If models aren’t performing as expected, consider adjusting the dataset or finetuning the models.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox