This function calls the Monarch-hosted semantic similarity API to compare two graphs, via the same endpoints as the Monarch Phenotype Explorer: https://monarchinitiative.org/explore#phenotype-explorer.
monarch_semsim(
query_graph,
target_graph,
metric = "ancestor_information_content",
include_reverse = FALSE,
keep_unmatched = FALSE
)A tbl_kgx graph.
A tbl_kgx graph.
The semantic similarity metric to use. Default is
"ancestor_information_content". Also available are
"jaccard_similarity" and "phenodigm_score".
Whether to include the best matches from the target
graph to the query graph. Default is FALSE.
Whether to keep nodes in the target graph that do not
have a match. Default is FALSE.
A tbl_kgx graph with "computed:best_matches" edges between the
nodes of the two input graphs and columns for
monarch_semsim_metric, monarch_semsim_score, and
monarch_semsim_ancestor_id.
The API returns the best matches between the nodes of the two graphs, based
on
a specified knowledge-graph-based metric: the default is
"ancestor_information_content",
also available are "jaccard_similarity" and "phenodigm_score". The
result is
returned as a graph, with "computed:best_matches" edges between the nodes
of the two input graphs.
By default, the function only returns the best matches from the first graph
to the second graph, and
removes any nodes that do not have a match. If include_reverse = TRUE, the
function also returns
the best matches from the second graph to the first graph.
The engine attached to the return graph is that of the query.
g1 <- monarch_engine() |>
fetch_nodes(query_ids = "MONDO:0007947") |>
expand(categories = "biolink:PhenotypicFeature")
#> Trying to connect to https://neo4j.monarchinitiative.org
#> Connected to https://neo4j.monarchinitiative.org
#> Fetching; counting matching nodes...
#> total: 1.
#> Fetching; fetched1of1
#> Expanding; counting matching edges...
#> total: 139.
#> Expanding; fetched139of139edges.
g2 <- monarch_engine() |>
fetch_nodes(query_ids = "MONDO:0007522") |>
expand(categories = "biolink:PhenotypicFeature")
#> Trying to connect to https://neo4j.monarchinitiative.org
#> Connected to https://neo4j.monarchinitiative.org
#> Fetching; counting matching nodes...
#> total: 1.
#> Fetching; fetched1of1
#> Expanding; counting matching edges...
#> total: 66.
#> Expanding; fetched66of66edges.
sim <- monarch_semsim(g1, g2)
#> Trying to connect to https://neo4j.monarchinitiative.org
#> Connected to https://neo4j.monarchinitiative.org
#> Joining with `by = join_by(id, pcategory, name, description, synonym, category,
#> iri, xref, namespace, provided_by, file_source, exact_synonym, subsets,
#> related_synonym, broad_synonym, narrow_synonym)`
print(sim)
#> # A tbl_graph: 132 nodes and 107 edges
#> #
#> # A directed multigraph with 37 components
#> #
#> # Node Data: 132 × 16 (active)
#> id pcategory name description synonym category iri xref namespace
#> <chr> <chr> <chr> <chr> <list> <list> <chr> <list> <chr>
#> 1 MONDO:00… biolink:… Marf… A disorder… <list> <chr> http… <list> MONDO
#> 2 HP:00015… biolink:… Disp… A tall and… <list> <chr> http… <list> HP
#> 3 HP:00015… biolink:… Slen… Asthenic h… <list> <chr> http… <list> HP
#> 4 HP:00017… biolink:… Pes … A foot whe… <list> <chr> http… <list> HP
#> 5 HP:00021… biolink:… Spon… Pneumothor… <list> <chr> http… <list> HP
#> 6 HP:00026… biolink:… Aort… An abnorma… <list> <chr> http… <list> HP
#> 7 HP:00049… biolink:… Aort… Aortic dil… <list> <chr> http… <list> HP
#> 8 HP:00124… biolink:… Chro… Subjective… <list> <chr> http… <list> HP
#> 9 HP:00002… biolink:… Narr… Bizygomati… <list> <chr> http… <list> HP
#> 10 HP:00005… biolink:… Visu… Visual imp… <list> <chr> http… <list> HP
#> # ℹ 122 more rows
#> # ℹ 7 more variables: provided_by <chr>, file_source <chr>,
#> # exact_synonym <list>, subsets <chr>, related_synonym <list>,
#> # broad_synonym <list>, narrow_synonym <list>
#> #
#> # Edge Data: 107 × 9
#> from to subject predicate object primary_knowledge_so…¹
#> <int> <int> <chr> <chr> <chr> <chr>
#> 1 107 127 HP:0000006 computed:best_matches HP:0004947 computed:monarch_sems…
#> 2 33 33 HP:0000023 computed:best_matches HP:0000023 computed:monarch_sems…
#> 3 86 116 HP:0000098 computed:best_matches HP:0000286 computed:monarch_sems…
#> # ℹ 104 more rows
#> # ℹ abbreviated name: ¹primary_knowledge_source
#> # ℹ 3 more variables: monarch_semsim_metric <chr>, monarch_semsim_score <dbl>,
#> # monarch_semsim_ancestor_id <chr>
# also include the unmatched targets
sim <- monarch_semsim(g1, g2, keep_unmatched = TRUE)
#> Trying to connect to https://neo4j.monarchinitiative.org
#> Connected to https://neo4j.monarchinitiative.org
#> Joining with `by = join_by(id, pcategory, name, description, synonym, category,
#> iri, xref, namespace, provided_by, file_source, exact_synonym, subsets,
#> related_synonym, broad_synonym, narrow_synonym)`
print(sim)
#> # A tbl_graph: 162 nodes and 107 edges
#> #
#> # A directed multigraph with 67 components
#> #
#> # Node Data: 162 × 16 (active)
#> id pcategory name description synonym category iri xref namespace
#> <chr> <chr> <chr> <chr> <list> <list> <chr> <list> <chr>
#> 1 MONDO:00… biolink:… Marf… A disorder… <list> <chr> http… <list> MONDO
#> 2 HP:00015… biolink:… Disp… A tall and… <list> <chr> http… <list> HP
#> 3 HP:00015… biolink:… Slen… Asthenic h… <list> <chr> http… <list> HP
#> 4 HP:00017… biolink:… Pes … A foot whe… <list> <chr> http… <list> HP
#> 5 HP:00021… biolink:… Spon… Pneumothor… <list> <chr> http… <list> HP
#> 6 HP:00026… biolink:… Aort… An abnorma… <list> <chr> http… <list> HP
#> 7 HP:00049… biolink:… Aort… Aortic dil… <list> <chr> http… <list> HP
#> 8 HP:00124… biolink:… Chro… Subjective… <list> <chr> http… <list> HP
#> 9 HP:00002… biolink:… Narr… Bizygomati… <list> <chr> http… <list> HP
#> 10 HP:00005… biolink:… Visu… Visual imp… <list> <chr> http… <list> HP
#> # ℹ 152 more rows
#> # ℹ 7 more variables: provided_by <chr>, file_source <chr>,
#> # exact_synonym <list>, subsets <chr>, related_synonym <list>,
#> # broad_synonym <list>, narrow_synonym <list>
#> #
#> # Edge Data: 107 × 9
#> from to subject predicate object primary_knowledge_so…¹
#> <int> <int> <chr> <chr> <chr> <chr>
#> 1 107 112 HP:0000006 computed:best_matches HP:0001073 computed:monarch_sems…
#> 2 33 33 HP:0000023 computed:best_matches HP:0000023 computed:monarch_sems…
#> 3 86 126 HP:0000098 computed:best_matches HP:0000286 computed:monarch_sems…
#> # ℹ 104 more rows
#> # ℹ abbreviated name: ¹primary_knowledge_source
#> # ℹ 3 more variables: monarch_semsim_metric <chr>, monarch_semsim_score <dbl>,
#> # monarch_semsim_ancestor_id <chr>
# include reverse matches
sim <- monarch_semsim(g1, g2, include_reverse = TRUE)
#> Trying to connect to https://neo4j.monarchinitiative.org
#> Connected to https://neo4j.monarchinitiative.org
#> Joining with `by = join_by(id, pcategory, name, description, synonym, category,
#> iri, xref, namespace, provided_by, file_source, exact_synonym, subsets,
#> related_synonym, broad_synonym, narrow_synonym)`
print(sim)
#> # A tbl_graph: 162 nodes and 174 edges
#> #
#> # A directed multigraph with 31 components
#> #
#> # Node Data: 162 × 16 (active)
#> id pcategory name description synonym category iri xref namespace
#> <chr> <chr> <chr> <chr> <list> <list> <chr> <list> <chr>
#> 1 MONDO:00… biolink:… Marf… A disorder… <list> <chr> http… <list> MONDO
#> 2 HP:00015… biolink:… Disp… A tall and… <list> <chr> http… <list> HP
#> 3 HP:00015… biolink:… Slen… Asthenic h… <list> <chr> http… <list> HP
#> 4 HP:00017… biolink:… Pes … A foot whe… <list> <chr> http… <list> HP
#> 5 HP:00021… biolink:… Spon… Pneumothor… <list> <chr> http… <list> HP
#> 6 HP:00026… biolink:… Aort… An abnorma… <list> <chr> http… <list> HP
#> 7 HP:00049… biolink:… Aort… Aortic dil… <list> <chr> http… <list> HP
#> 8 HP:00124… biolink:… Chro… Subjective… <list> <chr> http… <list> HP
#> 9 HP:00002… biolink:… Narr… Bizygomati… <list> <chr> http… <list> HP
#> 10 HP:00005… biolink:… Visu… Visual imp… <list> <chr> http… <list> HP
#> # ℹ 152 more rows
#> # ℹ 7 more variables: provided_by <chr>, file_source <chr>,
#> # exact_synonym <list>, subsets <chr>, related_synonym <list>,
#> # broad_synonym <list>, narrow_synonym <list>
#> #
#> # Edge Data: 174 × 9
#> from to subject predicate object primary_knowledge_so…¹
#> <int> <int> <chr> <chr> <chr> <chr>
#> 1 107 147 HP:0000006 computed:best_matches HP:0004944 computed:monarch_sems…
#> 2 33 33 HP:0000023 computed:best_matches HP:0000023 computed:monarch_sems…
#> 3 86 6 HP:0000098 computed:best_matches HP:0002616 computed:monarch_sems…
#> # ℹ 171 more rows
#> # ℹ abbreviated name: ¹primary_knowledge_source
#> # ℹ 3 more variables: monarch_semsim_metric <chr>, monarch_semsim_score <dbl>,
#> # monarch_semsim_ancestor_id <chr>