This function calls the Monarch-hosted semantic similarity API to compare two graphs, via the same endpoints as the Monarch Phenotype Explorer: https://monarchinitiative.org/explore#phenotype-explorer.

monarch_semsim(
  query_graph,
  target_graph,
  metric = "ancestor_information_content",
  include_reverse = FALSE,
  keep_unmatched = FALSE
)

Arguments

query_graph

A tbl_kgx graph.

target_graph

A tbl_kgx graph.

metric

The semantic similarity metric to use. Default is "ancestor_information_content". Also available are "jaccard_similarity" and "phenodigm_score".

include_reverse

Whether to include the best matches from the target graph to the query graph. Default is FALSE.

keep_unmatched

Whether to keep nodes in the target graph that do not have a match. Default is FALSE.

Value

A tbl_kgx graph with "computed:best_matches" edges between the nodes of the two input graphs.

Details

The API returns the best matches between the nodes of the two graphs, based on a specified knowledge-graph-boased metric: the default is "ancestor_information_content", also available are "jaccard_similarity" and "phenodigm_score". The result is returned as a graph, with "computed:best_matches" edges between the nodes of the two input graphs.

By default, the function only returns the best matches from the first graph to the second graph, and removes any nodes that do not have a match. If include_reverse = TRUE, the function also returns the best matches from the second graph to the first graph.

The engine attached to the return graph is that of the query.

Examples


g1 <- monarch_engine() |>
  fetch_nodes(query_ids = "MONDO:0007947") |>
  expand(categories = "biolink:PhenotypicFeature")
#> Trying to connect to https://neo4j.monarchinitiative.org
#> Connected to https://neo4j.monarchinitiative.org
#> Fetching; counting matching nodes... 
#>  total: 1.
#> Fetching; fetched 1 of 1
#> Expanding; counting matching edges... 
#>  total: 106.
#> Expanding; fetched 106 of 106 edges.

g2 <- monarch_engine() |>
  fetch_nodes(query_ids = "MONDO:0007522") |>
  expand(categories = "biolink:PhenotypicFeature")
#> Trying to connect to https://neo4j.monarchinitiative.org
#> Connected to https://neo4j.monarchinitiative.org
#> Fetching; counting matching nodes... 
#>  total: 1.
#> Fetching; fetched 1 of 1
#> Expanding; counting matching edges... 
#>  total: 66.
#> Expanding; fetched 66 of 66 edges.

sim <- monarch_semsim(g1, g2)
#> Trying to connect to https://neo4j.monarchinitiative.org
#> Connected to https://neo4j.monarchinitiative.org
#> Joining with `by = join_by(id, pcategory, name, description, synonym, category,
#> iri, xref, namespace, provided_by)`
print(sim)
#> # A tbl_graph: 136 nodes and 107 edges
#> #
#> # A directed multigraph with 41 components
#> #
#> # Node Data: 136 × 10 (active)
#>    id         pcategory name  description synonym category iri   xref  namespace
#>    <chr>      <chr>     <chr> <chr>       <named> <list>   <chr> <nam> <chr>    
#>  1 MONDO:000… biolink:… Marf… A disorder… <chr>   <chr>    http… <chr> MONDO    
#>  2 HP:0430043 biolink:… Thor… Thoracic l… <lgl>   <chr>    http… <lgl> HP       
#>  3 HP:0000483 biolink:… Asti… A type of … <chr>   <chr>    http… <chr> HP       
#>  4 HP:0001377 biolink:… Limi… Limited ab… <chr>   <chr>    http… <chr> HP       
#>  5 HP:0000486 biolink:… Stra… A misalign… <chr>   <chr>    http… <chr> HP       
#>  6 HP:0005136 biolink:… Mitr… Mitral ann… <chr>   <chr>    http… <chr> HP       
#>  7 HP:0001371 biolink:… Flex… A flexion … <chr>   <chr>    http… <chr> HP       
#>  8 HP:0003199 biolink:… Decr… NA          <chr>   <chr>    http… <chr> HP       
#>  9 HP:0025586 biolink:… Hype… A type of … <lgl>   <chr>    http… <lgl> HP       
#> 10 HP:0000518 biolink:… Cata… A cataract… <chr>   <chr>    http… <chr> HP       
#> # ℹ 126 more rows
#> # ℹ 1 more variable: provided_by <named list>
#> #
#> # Edge Data: 107 × 8
#>    from    to subject    predicate             object   metric score ancestor_id
#>   <int> <int> <chr>      <chr>                 <chr>    <chr>  <dbl> <chr>      
#> 1   107   119 HP:0000006 computed:best_matches HP:0001… ances…  4.33 HP:0000001 
#> 2    84    84 HP:0000023 computed:best_matches HP:0000… ances… 18.5  HP:0000023 
#> 3    24   116 HP:0000098 computed:best_matches HP:0000… ances…  6.25 UPHENO:000…
#> # ℹ 104 more rows

# also inclue the unmatched targets
sim <- monarch_semsim(g1, g2, keep_unmatched = TRUE)
#> Trying to connect to https://neo4j.monarchinitiative.org
#> Connected to https://neo4j.monarchinitiative.org
#> Joining with `by = join_by(id, pcategory, name, description, synonym, category,
#> iri, xref, namespace, provided_by)`
print(sim)
#> # A tbl_graph: 162 nodes and 107 edges
#> #
#> # A directed multigraph with 67 components
#> #
#> # Node Data: 162 × 10 (active)
#>    id         pcategory name  description synonym category iri   xref  namespace
#>    <chr>      <chr>     <chr> <chr>       <named> <list>   <chr> <nam> <chr>    
#>  1 MONDO:000… biolink:… Marf… A disorder… <chr>   <chr>    http… <chr> MONDO    
#>  2 HP:0430043 biolink:… Thor… Thoracic l… <lgl>   <chr>    http… <lgl> HP       
#>  3 HP:0000483 biolink:… Asti… A type of … <chr>   <chr>    http… <chr> HP       
#>  4 HP:0001377 biolink:… Limi… Limited ab… <chr>   <chr>    http… <chr> HP       
#>  5 HP:0000486 biolink:… Stra… A misalign… <chr>   <chr>    http… <chr> HP       
#>  6 HP:0005136 biolink:… Mitr… Mitral ann… <chr>   <chr>    http… <chr> HP       
#>  7 HP:0001371 biolink:… Flex… A flexion … <chr>   <chr>    http… <chr> HP       
#>  8 HP:0003199 biolink:… Decr… NA          <chr>   <chr>    http… <chr> HP       
#>  9 HP:0025586 biolink:… Hype… A type of … <lgl>   <chr>    http… <lgl> HP       
#> 10 HP:0000518 biolink:… Cata… A cataract… <chr>   <chr>    http… <chr> HP       
#> # ℹ 152 more rows
#> # ℹ 1 more variable: provided_by <named list>
#> #
#> # Edge Data: 107 × 8
#>    from    to subject    predicate             object   metric score ancestor_id
#>   <int> <int> <chr>      <chr>                 <chr>    <chr>  <dbl> <chr>      
#> 1   107   144 HP:0000006 computed:best_matches HP:0003… ances…  4.33 HP:0000001 
#> 2    84    84 HP:0000023 computed:best_matches HP:0000… ances… 18.5  HP:0000023 
#> 3    24   126 HP:0000098 computed:best_matches HP:0000… ances…  6.25 UPHENO:000…
#> # ℹ 104 more rows

# inclue reverse matches
sim <- monarch_semsim(g1, g2, include_reverse = TRUE)
#> Trying to connect to https://neo4j.monarchinitiative.org
#> Connected to https://neo4j.monarchinitiative.org
#> Joining with `by = join_by(id, pcategory, name, description, synonym, category,
#> iri, xref, namespace, provided_by)`
print(sim)
#> # A tbl_graph: 162 nodes and 174 edges
#> #
#> # A directed multigraph with 28 components
#> #
#> # Node Data: 162 × 10 (active)
#>    id         pcategory name  description synonym category iri   xref  namespace
#>    <chr>      <chr>     <chr> <chr>       <named> <list>   <chr> <nam> <chr>    
#>  1 MONDO:000… biolink:… Marf… A disorder… <chr>   <chr>    http… <chr> MONDO    
#>  2 HP:0430043 biolink:… Thor… Thoracic l… <lgl>   <chr>    http… <lgl> HP       
#>  3 HP:0000483 biolink:… Asti… A type of … <chr>   <chr>    http… <chr> HP       
#>  4 HP:0001377 biolink:… Limi… Limited ab… <chr>   <chr>    http… <chr> HP       
#>  5 HP:0000486 biolink:… Stra… A misalign… <chr>   <chr>    http… <chr> HP       
#>  6 HP:0005136 biolink:… Mitr… Mitral ann… <chr>   <chr>    http… <chr> HP       
#>  7 HP:0001371 biolink:… Flex… A flexion … <chr>   <chr>    http… <chr> HP       
#>  8 HP:0003199 biolink:… Decr… NA          <chr>   <chr>    http… <chr> HP       
#>  9 HP:0025586 biolink:… Hype… A type of … <lgl>   <chr>    http… <lgl> HP       
#> 10 HP:0000518 biolink:… Cata… A cataract… <chr>   <chr>    http… <chr> HP       
#> # ℹ 152 more rows
#> # ℹ 1 more variable: provided_by <named list>
#> #
#> # Edge Data: 174 × 8
#>    from    to subject    predicate             object   metric score ancestor_id
#>   <int> <int> <chr>      <chr>                 <chr>    <chr>  <dbl> <chr>      
#> 1   107   161 HP:0000006 computed:best_matches HP:0001… ances…  4.33 HP:0000001 
#> 2    84    84 HP:0000023 computed:best_matches HP:0000… ances… 18.5  HP:0000023 
#> 3    24   126 HP:0000098 computed:best_matches HP:0000… ances…  6.25 UPHENO:000…
#> # ℹ 171 more rows