🔎 Methods and Experiments¶

Each of the following is an experimental project and/or workflow developed with some degree of Monarch involvement.

Name	Description	Links
AutoMAxO	AutoMAxO leverages the power of Large Language Models (LLMs) to streamline the biocuration of medical actions for rare diseases. By automating the annotation process of clinical management data, AutoMAxO significantly enhances efficiency and scalability, making it easier for researchers and healthcare professionals to access and utilize critical information.	GitHub; Docs
Consistent Performance of GPT-4o in Rare Disease Diagnosis Across Nine Languages and 4967 Cases	The differential diagnostic performance of GPT-4o across a comprehensive corpus of rare-disease cases was consistent across the nine languages tested. This suggests that LLMs such as GPT-4o may have utility in non-English clinical settings.	medRxiv
CurateGPT	Curation support with LLMs and RAG supported by structured knowledge. A prototype web application and framework for performing general purpose AI-guided curation and curation-related operations over collections of objects. Uses DRAGON-AI and RAG. Does not do grounding or mapping (just knows about JSON objects). Resources are stored in a vector database (ChromaDB, duckdb, etc).	arXiv; Draft; GitHub; Docs
DRAGON-AI	Ontology generation method employed by CurateGPT. Uses LLMs and Retrieval Augmented Generation (RAG).	JBMS
ELDER	ELDER is an algorithm that uses text embeddings for differential diagnosis. It takes phenotype terms as input and queries a vector database of diseases to find the most similar diseases.	GitHub; Draft
GPT for Cell Type Summaries	Experiments with LLMs and OntoGPT in generating summary descriptions of cell types.	Doc
MapperGPT	An approach that uses LLMs to review and refine mapping relationships as a post-processing step, in concert with existing high-recall methods that are based on lexical and structural heuristics.	arXiv; Manubot Draft
On the limitations of large language models in clinical diagnosis	This manuscript examines the effect on clinical diagnosis of presenting the clinical data in different ways (presenting at full clinical case reports, sets of observed signs/symptoms, etc.	medRxiv; Draft
Phenomics Assistant: An Interface for LLM-based Biomedical Knowledge Graph Exploration	The methods underlying the Phenomics Assistant tool. Includes comparative evaluations of the tool’s ability to correctly identify gene-disease association and gene alias queries.	bioRxiv
pheval.llm	A project to evaluate LLMs' capability at performing differential diagnosis for rare genetic diseases through medical-vignette-like prompts created with phenopacket2prompt.	Docs
SPIRES	Structured prompt interrogation and recursive extraction of semantics. Strong in reusing existing vocabularies. Recursive! No term expansion or preprocessing. Doesn’t discover its own synonyms (heart attack vs myocardial infarction). Less NLP-like guessing, so possibly beneficial.	Bioinformatics
Systematic benchmarking demonstrates large language models have not reached the diagnostic accuracy of traditional rare-disease decision support tools	Large language models (LLMs) show promise in supporting differential diagnosis, but their performance is challenging to evaluate due to the unstructured nature of their responses. To assess the current capabilities of LLMs to diagnose genetic diseases, we benchmarked these models on 5,213 case reports using the Phenopacket Schema, the Human Phenotype Ontology and Mondo disease ontology. Prompts generated from each phenopacket were sent to three generative pretrained transformer (GPT) models. The same phenopackets were used as input to a widely used diagnostic tool, Exomiser, in phenotype-only mode. The best LLM ranked the correct diagnosis first in 23.6% of cases, whereas Exomiser did so in 35.5% of cases.	medRxiv
TALISMAN	Uses GPT models to perform gene set function summarization as a complement to standard enrichment analysis. Formerly named SPINDOCTOR.	arXiv; Draft