Monarch Ingest
Overview
The Monarch Ingest generates KGX formatted files conforming to the BioLink Model from a wide variety of biomedical data sources.
The eventual output of the Monarch Ingest process is the Monarch KG.
The latest version of this can be found at data.monarchinitiative.org
See also the folder monarch-kg-dev/latest
Monarch Ingest is built using Poetry, which will create its own virtual environment.
Installation
monarch-ingest is a Python 3.8+ package, installable via Poetry.
-
Install Poetry, if you don't already have it:
curl -sSL https://install.python-poetry.org | python3 - # Optional: Have poetry create its venvs in your project directories poetry config virtualenvs.in-project true
-
Clone the repo and build the code:
git clone git@github.com/monarch-initiative/monarch-ingest
-
Install monarch-ingest:
cd monarch-ingest poetry install
-
(Optional) Activate the virtual environment:
# This step removes the need to prefix all commands with `poetry run` poetry shell
Usage
For a detailed tutorial on ingests and how to make one, see the Create an Ingest tab.
CLI usage is available in the CLI tab, gcor by running ingest --help
.
Run the whole pipeline!
-
Download the source data:
ingest download --all
-
Run all transforms:
ingest transform --all
-
Merge all transformed output into a tar.gz containing one node and one edge file
ingest merge
-
Upload the results to the Monarch Ingest Google bucket
ingest release