🔎 Methods and Experiments¶
Each of the following is an experimental project and/or workflow developed with some degree of Monarch involvement.
| Name | Description | Links |
|---|---|---|
| AutoMAxO | AutoMAxO leverages the power of Large Language Models (LLMs) to streamline the biocuration of medical actions for rare diseases. By automating the annotation process of clinical management data, AutoMAxO significantly enhances efficiency and scalability, making it easier for researchers and healthcare professionals to access and utilize critical information. | GitHub; Docs |
| Consistent Performance of GPT-4o in Rare Disease Diagnosis Across Nine Languages and 4967 Cases | The differential diagnostic performance of GPT-4o across a comprehensive corpus of rare-disease cases was consistent across the nine languages tested. This suggests that LLMs such as GPT-4o may have utility in non-English clinical settings. | medRxiv |
| CurateGPT | Curation support with LLMs and RAG supported by structured knowledge. A prototype web application and framework for performing general purpose AI-guided curation and curation-related operations over collections of objects. Uses DRAGON-AI and RAG. Does not do grounding or mapping (just knows about JSON objects). Resources are stored in a vector database (ChromaDB, duckdb, etc). | arXiv; Draft; GitHub; Docs |
| DRAGON-AI | Ontology generation method employed by CurateGPT. Uses LLMs and Retrieval Augmented Generation (RAG). | JBMS |
| ELDER | ELDER is an algorithm that uses text embeddings for differential diagnosis. It takes phenotype terms as input and queries a vector database of diseases to find the most similar diseases. | GitHub; Draft |
| GPT for Cell Type Summaries | Experiments with LLMs and OntoGPT in generating summary descriptions of cell types. | Doc |
| MapperGPT | An approach that uses LLMs to review and refine mapping relationships as a post-processing step, in concert with existing high-recall methods that are based on lexical and structural heuristics. | arXiv; Manubot Draft |
| On the limitations of large language models in clinical diagnosis | This manuscript examines the effect on clinical diagnosis of presenting the clinical data in different ways (presenting at full clinical case reports, sets of observed signs/symptoms, etc. | medRxiv; Draft |
| Phenomics Assistant: An Interface for LLM-based Biomedical Knowledge Graph Exploration | The methods underlying the Phenomics Assistant tool. Includes comparative evaluations of the tool’s ability to correctly identify gene-disease association and gene alias queries. | bioRxiv |
| pheval.llm | A project to evaluate LLMs' capability at performing differential diagnosis for rare genetic diseases through medical-vignette-like prompts created with phenopacket2prompt. | Docs |
| SPIRES | Structured prompt interrogation and recursive extraction of semantics. Strong in reusing existing vocabularies. Recursive! No term expansion or preprocessing. Doesn’t discover its own synonyms (heart attack vs myocardial infarction). Less NLP-like guessing, so possibly beneficial. | Bioinformatics |
| Systematic benchmarking demonstrates large language models have not reached the diagnostic accuracy of traditional rare-disease decision support tools | Large language models (LLMs) show promise in supporting differential diagnosis, but their performance is challenging to evaluate due to the unstructured nature of their responses. To assess the current capabilities of LLMs to diagnose genetic diseases, we benchmarked these models on 5,213 case reports using the Phenopacket Schema, the Human Phenotype Ontology and Mondo disease ontology. Prompts generated from each phenopacket were sent to three generative pretrained transformer (GPT) models. The same phenopackets were used as input to a widely used diagnostic tool, Exomiser, in phenotype-only mode. The best LLM ranked the correct diagnosis first in 23.6% of cases, whereas Exomiser did so in 35.5% of cases. | medRxiv |
| TALISMAN | Uses GPT models to perform gene set function summarization as a complement to standard enrichment analysis. Formerly named SPINDOCTOR. | arXiv; Draft |