[![Awesome](https://awesome.rebadge.svg)](https://awesome.re)
![Stars](https://img.shields.io/github/stars/QizhiPei/Awesome-Biomolecule-Language-Cross-Modeling?color=yellow&label=Stars&labelColor=555555)
![Forks](https://img.shields.io/github/forks/QizhiPei/Awesome-Biomolecule-Language-Cross-Modeling?color=blue&label=Fork&labelColor=555555)
The repository for Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey, including related models, datasets benchmarks, and other resource links. We will keep this repository updated. If you have a paper or resource you’d like to add, feel free to submit a pull request, open an issue, or email the author at qizhipei@ruc.edu.cn.
Table of Contents
Models
About the models, think of them as the superheroes of the biomolecule and natural language realm. Each model is like a superhero with a unique set of powers:
- BioBERT: The text-mining savior for biomedicine.
- SciBERT: Powerful enough to tackle scientific papers.
- ClinicalBERT: The clinical notes wizard, keeping hospital readmissions in check.
- GatorTron: The overarching hero helping to unlock patient information.
Imagine a superhero team where each character specializes in a different ability, similar to how these models function in their respective domains. Using strengths from each model allows researchers to tackle complex problems across multiple disciplines.
Datasets & Benchmarks
The backbone of training these models is a rich set of datasets. Here’s a glance at some key datasets you can use:
Dataset | Usage | Modality | Link |
---|---|---|---|
PubMed | Pre-training | Text | Link |
ZINC | Pre-training | Molecule | Link, Link |
UniProt | Pre-training | Protein | Link |
Related Resources
Related Surveys & Evaluations
- A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery
- Bioinformatics and Biomedical Informatics with ChatGPT: Year One Review
Acknowledgements
This repository is contributed and updated by QizhiPei and Lijun Wu. If you have questions, don’t hesitate to open an issue or ask me via qizhipei@ruc.edu.cn or Lijun Wu via lijun_wu@outlook.com. We are happy to hear from you!
Troubleshooting Ideas
If you encounter any issues while exploring the models, here are a few troubleshooting tips:
- Make sure you have the required libraries installed for the models.
- Check the documentation for each model to ensure correct usage.
- If models aren’t performing as expected, consider adjusting the dataset or finetuning the models.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.