closurizer
Details
| GitHub | monarch-initiative/closurizer |
| Language | Python |
| Description | Add closure expansion fields to kgx files following the Golr pattern |
Dependencies
External Dependencies
| Package | Version |
|---|---|
| python | ^3.8 |
| click | ^8 |
| SQLAlchemy | ^1.4.37 |
| duckdb | * |
Documentation
Monarch Closurizer
Closurizer adds expansion fields to kgx files following the Golr pattern
Usage
As a module
Archive input (tar.gz file):
from closurizer.closurizer import add_closure
add_closure(
closure_file="my-relations-non-redundant.tsv",
nodes_output_file="output/nodes-with-closures.tsv",
edges_output_file="output/edges-denormalized.tsv",
kg_archive="my-kg.tar.gz",
database_path="working-database.duckdb", # database where processing occurs
edge_fields=["subject", "object"]
)
Database input (existing DuckDB):
from closurizer.closurizer import add_closure
add_closure(
closure_file="my-relations-non-redundant.tsv",
nodes_output_file="output/nodes-with-closures.tsv",
edges_output_file="output/edges-denormalized.tsv",
database_path="existing-kg.duckdb", # existing database with nodes/edges tables
edge_fields=["subject", "object"]
)
As a command line tool
Archive input:
closurizer --kg my-kg.tar.gz --database working.duckdb --closure relations.tsv --nodes-output nodes.tsv --edges-output edges.tsv
Database input:
closurizer --database existing.duckdb --closure relations.tsv --nodes-output nodes.tsv --edges-output edges.tsv
Note: If --kg is provided, the archive will be loaded into the specified database. If --kg is not provided, the database must already exist and contain nodes and edges tables.
Example
Closurizer will produce a denormalized edge file including subject namespace and category along with ID and label closures
| subject_category | subject_closure | subject_closure_label | subject_namespace | subject | predicate | object | object_namespace | object_closure_label | object_closure | object_category |
|---|---|---|---|---|---|---|---|---|---|---|
| biolink:Gene | HGNC | HGNC:4851 | biolink:gene_associated_with_condition | MONDO:0007739 | MONDO | Huntington disease and related disorders, movement disorder | MONDO:0000167, MONDO:0005395 | biolink:Disease |
Example source KG
Nodes:
| category | id | name | in_taxon |
|----------------|----------------|----- |-----------|
| biolink:Gene | HGNC:4851 | HTT | NCBITaxon:9606 |
| biolink:Disease | MONDO:0007739 | Huntington disease | |
Edges: | subject | predicate | object | |---------------|----------------------------------------|---------------| | HGNC:4851 | biolink:gene_associated_with_condition | MONDO:0007739 |
and a Relation Graph closure tsv file with:
| subject | predicate | object |
|---|---|---|
| MONDO:0007739 | rdfs:subClassOf | MONDO:0000167 |
| MONDO:0007739 | rdfs:subClassOf | MONDO:0005395 |