Skip to content

Home

Monarch Documentation

The Monarch Initiative and our collaborators develop a wide range of tools and ontologies to tackle global challenges such as Rare Disease. Here we document the software and data infrastructure across the entire Monarch ecosystem. If you would like any specific documentation to be added please use our Monarch-wide issue tracker.

Standards documentation

Repository Description Tracker
Phenopackets The Phenopacket Schema represents an open standard for sharing disease and phenotype information to improve our ability to understand, diagnose, and treat both rare and common diseases. A Phenopacket links detailed phenotype descriptions with disease, patient, and genetic information, enabling clinicians, biologists, and disease and drug researchers to build more complete models of disease. The standard is designed to encourage wide adoption and synergy between the people, organizations and systems that comprise the joint effort to address human disease and biological understanding. Issue Tracker
Simple Standard for Ontology Mappings The Simple Standard for Sharing Ontological Mappings (SSSOM), provides:
  • a TSV-based representation for ontology term mappings
  • a comprehensive set of standard metadata elements to describe mappings, and
  • a standard translation between the TSV and the Web Ontology Language (OWL). The SSSOM TSV format in particular is geared towards the needs of the wider bioinformatics community as a way to safely exchange mappings in an easily readable yet semantically well-specified manner. | Issue Tracker | | Human Phenotype Ontology | The Human Phenotype Ontology (HPO) is a structured vocabulary used to describe human health conditions and symptoms in a standardized way. It helps doctors, researchers, and computer systems consistently record and analyze patient characteristics, especially in the context of rare diseases. By organizing thousands of clinical features into a searchable hierarchy, HPO supports better diagnosis, data sharing, and discovery in genetic medicine. | Issue Tracker | | obographs | The obographs ontology format aims and providing a simple graph-oriented JSON format for ontologies. | Issue Tracker | | Mondo Disease Ontology | The Mondo Disease Ontology (Mondo) aims to harmonize disease definitions across the world, with a particular focus on rare diseases. Mondo goes beyond loose database cross-references. It curates precise 1:1 equivalence axioms connecting to other resources, validated by OWL reasoning. This means it is safe to propagate across these from Online Mendelian Inheritance in Man (OMIM), ORDO (Orphanet), Experimental Factor Ontology (EFO), Disease Ontology (DOID) and the neoplasm branch of National Cancer Institute Thesaurus (NCIt). | Issue Tracker | | Genotype Ontology | The Genotype Ontology (GENO) is an OWL2 ontology that represents the levels of genetic variation specified in genotypes, to support genotype-to-phenotype (G2P) data aggregation and analysis across diverse research communities and sources. The core of the ontology is a graph decomposing a genotype into smaller components of variation, from a complete genotype specifying sequence variation across an entire genome, down to specific allelic variants and sequence alterations. Structuring genotype instance data according to this model supports a primary use case of GENO to enable integrated analysis of G2P data where phenotype annotations are made at different levels of granularity in this genotype partonomy. GENO also enables description of various attributes of genotypes and genetic variants. These attributes include zygosity, genomic position, expression, dominance, and functional dependencies or consequences of a given variant.

In addition to heritable variation in genomic sequence specified by traditional genotypes, GENO also represents transient variation in gene expression, as seen in experiments where genes are targeted by knockdown reagents or overexpressed by DNA constructs at the time a phenotype is assessed. This variation in gene expression is represented in terms of the targeted genes themselves, to parallel representation of sequence variation and facilitate integrated description and analysis of data about any genetic contribution to a measured phenotype.

Finally, GENO also supports modelling of G2P associations, focusing on the interplay between genotype, phenotype, and environment. GENO describes provenance and experimental evidence for these associations using the Scientific Evidence and Provenance Information Ontology (SEPIO) model." | Issue Tracker | | Scientific Evidence and Provenance Information Ontology | The Scientific Evidence and Provenance Information Ontology (SEPIO) was developed to support description of evidence and provenance information for scientific claims. The core model represents the relationships between claims, their evidence lines, the information items that comprise these lines of evidence, and the methods, tools, and agents involved in the creation of these entities. Use cases driving SEPIO development include integration of scientific claims and their associated evidence/provenance metadata, and support for the discovery, analysis, and evaluation of claims based on this metadata. | Issue Tracker | | Vertebrate Breed Ontology | The Vertebrate Breed Ontology (VBO) is an open, community-driven ontology representing over 19,500 livestock and companion animal breed concepts covering 49 species. Breeds are classified based on community and expert conventions (e.g., cattle breed) and supported by relations to the breed's genus and species indicated by National Center for Biotechnology Information (NCBI) Taxonomy terms. Relationships between VBO terms (e.g., relating breeds to their foundation stock) provide additional context to support advanced data analytics. VBO term metadata includes synonyms, breed identifiers/codes, and attributed cross-references to other databases. | Issue Tracker | | The Unified Phenotype Ontology | The Unified Phenotype Ontology (uPheno) framework is a community effort to provide an integration layer over domain-specific phenotype ontologies, as a single, unified, logical representation. uPheno comprises:

  • a system for consistent computational definition of phenotype terms using ontology design patterns, maintained as a community library
  • a hierarchical vocabulary of species-neutral phenotype terms under which their species-specific counterparts are grouped
  • mapping tables between species-specific ontologies | Issue Tracker | | Medical Action Ontology | | Issue Tracker | | Ontology of Biological Attributes | The Ontology of Biological Attributes (OBA) is a formalised, species-independent collection of interoperable phenotypic trait categories that is intended to fulfil a data integration role. OBA is a standardised representational framework for observable attributes that are characteristics of biological entities, organisms, or parts of organisms. It is a collection of biological attributes (traits) covering all kingdoms of life and is interoperable with other ontologies in the Open Biological and Biomedical Ontologies Foundry. | Issue Tracker | | Uberon Anatomy Ontology | Uberon is an anatomical ontology that represents body parts, organs, and tissues in a variety of animal species, with a focus on vertebrates. It has been constructed to integrate seamlessly with other ontologies, such as the Cell Ontology workflows, the Gene Ontology, Trait and Phenotype ontologies, as well as other anatomical ontologies. The ontology includes comprehensive relationships to taxon-specific anatomical ontologies, allowing integration of functional, phenotype and expression data. Currently, Uberon consists of over 13,000 classes representing structures that are shared across a variety of metazoans. As one of the main uses of Uberon is translational science, we have extensive coverage of structures shared between humans and other species. However, thanks to the involvement of many collaborators, we have deep coverage of broad areas of anatomy across diverse taxa. We also make available an ontology called composite-metazoan, which brings in subsets of federated ontologies, with a total of over 40,000 classes. | Issue Tracker | | Phenomics Integrated Ontology | The Phenomics Integrated Ontology (Phenio) is an ontology for accessing and comparing knowledge concerning phenotypes across species and genetic backgrounds. | Issue Tracker | | Environmental Conditions, Treatments and Exposures Ontology | The Environmental Conditions, Treatments and Exposures Ontology (ECTO) describes exposures to experimental treatments of plants and model organisms (e.g. exposures to modification of diet, lighting levels, temperature); exposures of humans or any other organisms to stressors through a variety of routes, for purposes of public health, environmental monitoring etc, stimuli, natural and experimental, any kind of environmental condition or change in condition that can be experienced by an organism or population of organisms on earth. The scope is very general and can include for example plant treatment regimens, as well as human clinical exposures (although these may better be handled by a more specialized ontology). | Issue Tracker | | Dead simple owl design pattern (DOS-DP) exchange format | | Issue Tracker | | Knowledge Graph Change Language | | Issue Tracker | | Babelon - A simple standard for managing ontology translations and language profiles | | Issue Tracker |

Tools documentation

Repository Description Tracker
Exomiser Exomiser is a Java program that finds potential disease-causing variants from whole-exome or whole-genome sequencing data.
Starting from a VCF file and a set of phenotypes encoded using the Human Phenotype Ontology (HPO) it will annotate, filter and prioritise likely causative variants. The program does this based on criteria such as a variant's predicted pathogenicity, frequency of occurrence in a population and also how closely the given phenotype matches the phenotypic features of disease genes from human and model organism data.
The functional annotation of variants is handled by Jannovar and uses any of UCSC, RefSeq or Ensembl KnownGene transcript definitions and hg19 or hg38 genomic coordinates.
Variants are prioritised according to user-defined criteria on variant frequency, pathogenicity, quality, inheritance pattern, and model organism phenotype data. Predicted pathogenicity data is extracted from the dbNSFP resource. Variant frequency data is taken from the 1000 Genomes, ESP, TOPMed, UK10K, ExAC and gnomAD datasets. Subsets of these frequency and pathogenicity data can be defined to further tune the analysis. Cross-species phenotype comparisons come from our PhenoDigm tool powered by the OWLTools OWLSim algorithm. Issue Tracker
Monarch Knowledge Graph & Services Integrates gene, disease, and phenotype data across species. Issue Tracker
Monarch App A web application for exploring the Monarch Knowledge Graph. Issue Tracker
Monarch API The Monarch API is used to access information from the Monarch Knowledge Graph programmatically. Issue Tracker
Ontology Development Kit Manage your ontology's life cycle with the Ontology Development Kit (ODK)!The ODK is:
  • a toolbox of various ontology related tools such as ROBOT, owltools, dosdp-tools and many more, bundled as a docker image.
  • a set of executable workflows for managing your ontology's continuous integration, quality control, releases and dynamic imports. | Issue Tracker | | Monarch OLS | A browsable collection of ontologies provided by the Monarch Initiative using OLS4 (developed by EMBL-EBI). This instance of OLS is tailored to the needs of the Monarch Initiative and provides access to a range of ontologies relevant to genomics, phenomics, and biomedical research, which are not available through normal OLS. It also provides development snapshots of various unreleased ontologies for early access of the wider community. | Issue Tracker | | Ontology Access Kit | The Ontology Access Kit (OAK) is for both technical and non-technical users of ontologies. Non-technical users can use the command line interface to query ontologies in a variety of ways. Technical users can write Python code that uses the OAK library to perform ontology-related tasks. OAK provides a collection of interfaces for various ontology operations, including:

  • look up basic features of an ontology element, such as its label, definition, relationships, or aliases

  • search an ontology for a term
  • validate an ontology
  • modify or delete terms
  • generate and visualize subgraphs
  • identify lexical matches and export as SSSOM mapping tables
  • perform more advanced operations, such as graph traversal, OWL axiom processing, or text annotation

These interfaces are separated from any particular backend, for which there are a number of different adapters. This means the same Python API and command line can be used regardless of whether the ontology:

  • is served by a remote API such as OLS or BioPortal
  • is present locally on the filesystem in owl, obo, obojson, or sqlite formats
  • is to be downloaded from a remote repository such as the OBO library
  • is queried from a remote database, including SPARQL endpoints (Ontobee/Ubergraph), a SQL database, a Solr/ES endpoint | Issue Tracker | | SSSOM Toolkit | A python toolkit for processing SSSOM mapping files. | Issue Tracker | | LIkelihood Ratio Interpretation of Clinical AbnormaLities (LIRICAL) | LIkelihood Ratio Interpretation of Clinical AbnormaLities performs phenotype-driven prioritization of candidate diseases and genes in the setting of genomic diagnostics. | Issue Tracker | | PhEval | The absence of standardised benchmarks and data standardisation for Variant and Gene Prioritisation Algorithms (VGPAs) presents a significant challenge in the field of genomic research. PhEval addresses this gap. It is a novel framework designed to streamline the evaluation of VGPAs that incorporate phenotypic data.PhEval offers several key benefits:

  • Automated Processes: Reduces manual effort by automating various evaluation tasks, thus enhancing efficiency.

  • Standardisation: Ensures consistency and comparability in evaluation methodologies, leading to more reliable and standardised assessments.
  • Reproducibility: Facilitates reproducibility in research by providing a standardised platform, allowing for consistent validation of algorithms.
  • Comprehensive Benchmarking: Enables thorough benchmarking of algorithms, providing well-founded comparisons and deeper insights into their performance.

PhEval is a valuable tool for researchers looking to improve the accuracy and reliability of VGPA evaluations through a structured and standardised approach. | Issue Tracker | | obographviz | | Issue Tracker | | robot | | Issue Tracker | | OntoGPT | OntoGPT is a Python package for extracting structured information from text with large language models (LLMs), instruction prompts, and ontology-based grounding. It works well with OpenAI's GPT models as well as a selection of other LLMs. OntoGPT's output can be used for general-purpose natural language tasks (e.g., named entity recognition and relation extraction), summarization, knowledge base and knowledge graph construction, and more. | Issue Tracker | | Aurelian | Aurelian is an Agentic Universal Research Engine for Literature, Integration, Annotation, and Navigation. | Issue Tracker | | Knowledge Graph Registry | A simple registry for knowledge graphs, including Knowledge Graph Hub (KG-Hub) projects. | Issue Tracker | | Knowledge Graph Hub | The Knowledge Graph Hub (KG-Hub) is a platform that provides software development patterns for the standardized construction, exchange, and reuse of knowledge graphs. Features include a simple, modular extract-transform-load (ETL) pattern for ingest of upstream data in a Biolink-model compliant manner, cached downloads of upstream data, versioned and automatically updated builds, web-browsable storage of KG artifacts on cloud infrastructure, easy reuse of transformed subgraphs of upstream data across different projects, and easy integration of any OBO ontology. | Issue Tracker | | LinkML Registry | A collection of Linked Data Modeling Language (LinkML)schemas. | Issue Tracker | | TALISMAN | TALISMAN is a Python package for summarizing gene set functions using large language models (LLMs). | Issue Tracker | | CurateGPT | CurateGPT is a framework for performing general purpose AI-guided curation and curation-related operations over collections of objects, including ontologies and biological datasets. CurateGPT melds the power of generative AI together with trusted knowledge bases and literature sources. CurateGPT streamlines the curation process, enhancing collaboration and efficiency in common workflows. Compared to direct interaction with an LLM, CurateGPT's agents enable access to information beyond that in the LLM's training data and they provide direct links to the data supporting each claim. This helps curators, researchers, and engineers scale up curation efforts to keep pace with the ever-increasing volume of scientific data.CurateGPT also contains other tools and features, such as a prototype web application that allows LLMs to answer user questions using knowledge bases. | Issue Tracker | | Monarch Python Toolkit | | Issue Tracker | | Babelon Toolkit | | Issue Tracker | | Fenominal | | Issue Tracker | | Koza | | Issue Tracker | | Linked Data Modeling Language | LinkML (Linked data Modeling Language; linkml.io) is an open and extensible data modeling framework that provides a simple and structured way to describe and validate data. It includes a set of tools that make it easy to author, validate, and share data. LinkML can describe a range of data structures, from flat, list-based models to complex, interrelated, and normalized models that utilize polymorphism and compound inheritance. It offers an approachable syntax that is not tied to any one technical architecture, and can be integrated seamlessly with many existing frameworks. Its low barrier to entry allows people from different backgrounds and levels of technical expertise to collaborate to form a shared understanding of the underlying data semantics, providing an easy-to-understand basis for interdisciplinary collaboration. Its metamodel syntax allows modelers to focus on creating well-defined, stable, ontology-based data structures instead of a mixture of customized models and free text. LinkML helps reduce heterogeneity, complexity, and the proliferation of single-use data models while simultaneously enabling compliance with FAIR data standards. | Issue Tracker | | Monarch Ingest | The Monarch Ingest pulls in data from a wide variety of biomedical data sources and generates Biolink-compliant KGX files that are used to build the Monarch KG. | Issue Tracker | | Monarch Mapping Registry | Simple Standard for Ontology Mappings (SSSOM) mappings collected and curated by the Monarch Initiative. | Issue Tracker | | HPO Language Translations | Infrastructure to collect and coordinate HPO language translations. | Issue Tracker | | Phenomizer | A web application for clinical diagnostics in human genetics using semantic similarity searches in ontologies. | Issue Tracker | | Phenogrid | Phenogrid is a web component that visualizes semantic similarity calculations provided by OWLSim, as provided through APIs from the Monarch Initiative. | Issue Tracker | | monarchr | R package for easy access, manipulation, and analysis of the Monarch Initiative or other KGX-formatted knowledge graphs. | Issue Tracker | | Phenomics Assistant | An LLM retrieval augmented generation (RAG) agent for Monarch Initiative. | Issue Tracker | | pheval.llm | LLM Benchmarking in differential diagnostics. | Issue Tracker | | phenopacket2prompt | Creation of prompts from phenopackets, in multiple languages, intended for prompting LLMs. | Issue Tracker | | AutoMAxO | Streamline the creation of ontology annotations of MAxO via LLMs. | Issue Tracker | | pheval.exomiser | Exomiser plugin for PhEval. | Issue Tracker | | pheval.ai_marrvel | A PhEval plugin for integrating AI-MARRVEL, enabling variant prioritisation based on phenotypic data in a standardised pipeline | Issue Tracker | | pheval.phen2gene | A PhEval plugin for integrating Phen2Gene, enabling gene prioritisation based on phenotypic data in a standardised pipeline. | Issue Tracker | | pheval.gado | A PhEval plugin for integrating GADO, enabling gene prioritisation based on phenotypic data in a standardised pipeline. | Issue Tracker | | pheval.svanna | A PhEval plugin for integrating SvAnna, enabling structural variant prioritisation in a standardised pipeline. | Issue Tracker | | pheval.lirical | A PhEval plugin integrating LIRICAL for efficient phenotype and variant prioritisation in a standardised pipeline. | Issue Tracker | | pheval.phenogenius | A PhEval plugin for integrating PhenoGenius, enabling gene prioritisation based on phenotypic data in a standardised pipeline. | Issue Tracker | | phenotype2phenopacket | Phenotype2Phenopacket is a command-line tool that converts a phenotype annotation into GA4GH Phenopackets, facilitating standardised phenotypic data representation. | Issue Tracker |

Data documentation

Repository Description Tracker
HPOA - Disease Annotations Issue Tracker
HPOA - Phenotype to Gene Associations Issue Tracker
HPOA - Gene to Phenotype Associations Issue Tracker
HPOA Knowledge Graph Ingestible Issue Tracker
Alliance Knowledge Graph Ingestible Issue Tracker

Other documentation and training materials

Repository Description Tracker
OBO Academy A resource for self-paced training of Semantic Engineers. Issue Tracker
Best practices for collaborative open source coding The best practice / house style guide maintained by Monarch's LBNL team members. Issue Tracker