Skip to content

Monarch Ingest

Overview

The Monarch Ingest generates KGX formatted files conforming to the BioLink Model from a wide variety of biomedical data sources.

The eventual output of the Monarch Ingest process is the Monarch KG.
The latest version of this can be found at data.monarchinitiative.org

See also the folder monarch-kg-dev/latest

Monarch Ingest is built using Poetry, which will create its own virtual environment.

Installation

monarch-ingest is a Python 3.8+ package, installable via Poetry.

  1. Install Poetry, if you don't already have it:

    curl -sSL https://install.python-poetry.org | python3 -
    
    # Optional: Have poetry create its venvs in your project directories
    poetry config virtualenvs.in-project true
    

  2. Clone the repo and build the code:

    git clone git@github.com/monarch-initiative/monarch-ingest
    

  3. Install monarch-ingest:

    cd monarch-ingest
    poetry install
    

  4. (Optional) Activate the virtual environment:

    # This step removes the need to prefix all commands with `poetry run`
    poetry shell
    

Usage

For a detailed tutorial on ingests and how to make one, see the Create an Ingest tab.

CLI usage is available in the CLI tab, gcor by running ingest --help.

Run the whole pipeline!
  • Download the source data:

    ingest download --all
    

  • Run all transforms:

    ingest transform --all
    

  • Merge all transformed output into a tar.gz containing one node and one edge file

    ingest merge
    

  • Upload the results to the Monarch Ingest Google bucket

    ingest release