Given an optional KG engine (e.g. a file_engine(), neo4j_engine(), or monarch_engine()) and a query tbl_kgx() graph, fetches additional nodes and edges from the KG, expanding the query graph according to specific criteria. If the first parameter is an engine, that engine is used; if the first parameter is a query graph, the most recent engine associated with the graph is used.

expand(
  graph,
  engine = NULL,
  direction = "both",
  predicates = NULL,
  categories = NULL,
  transitive = FALSE,
  drop_unused_query_nodes = FALSE,
  ...
)

Arguments

graph

A query tbl_kgx() graph ot query with.

engine

(Optional) An engine to use for fetching query graph edges. If not provided, the graph's most recent engine is used.

direction

The direction of associations to fetch. Can be "in", "out", or "both". Default is "both".

predicates

A vector of relationship predicates (nodes in g are subjects in the KG), indicating which edges to consider in the neighborhood. If NULL (default), all edges are considered.

categories

A vector of node categories, indicating which nodes in the larger KG may be fetched. If NULL (default), all nodes in the larger KG are will be fetched.

transitive

If TRUE, include transitive closure of the neighborhood. Default is FALSE. Useful in combination with predicates like biolink:subclass_of.

drop_unused_query_nodes

If TRUE, remove query nodes from the result, unless they are at the neighborhood boundary, i.e., required for connecting to the result nodes. Default is FALSE.

...

Other parameters passed to methods.

Value

A tbl_kgx() graph

Examples

## Using Monarch (hosted)
phenos <- monarch_engine() |>
          fetch_nodes(query_ids = "MONDO:0007525") |>
          expand(predicates = "biolink:has_phenotype",
                 categories = "biolink:PhenotypicFeature")
#> Trying to connect to https://neo4j.monarchinitiative.org
#> Connected to https://neo4j.monarchinitiative.org
#> Fetching; counting matching nodes... 
#>  total: 1.
#> Fetching; fetched 1 of 1
#> Expanding; counting matching edges... 
#>  total: 48.
#> Expanding; fetched 48 of 48 edges.

print(phenos)
#> # A tbl_graph: 49 nodes and 48 edges
#> #
#> # A rooted tree
#> #
#> # Node Data: 49 × 10 (active)
#>    id         pcategory name  description synonym category iri   xref  namespace
#>    <chr>      <chr>     <chr> <chr>       <list>  <list>   <chr> <lis> <chr>    
#>  1 MONDO:000… biolink:… Ehle… Arthrochal… <chr>   <chr>    http… <chr> MONDO    
#>  2 HP:0000963 biolink:… Thin… Reduction … <chr>   <chr>    http… <chr> HP       
#>  3 HP:0000974 biolink:… Hype… A conditio… <chr>   <chr>    http… <chr> HP       
#>  4 HP:0001001 biolink:… Abno… NA          <chr>   <chr>    http… <chr> HP       
#>  5 HP:0001252 biolink:… Hypo… Hypotonia … <chr>   <chr>    http… <chr> HP       
#>  6 HP:0001373 biolink:… Join… Displaceme… <chr>   <chr>    http… <chr> HP       
#>  7 HP:0001385 biolink:… Hip … The presen… <chr>   <chr>    http… <chr> HP       
#>  8 HP:0001387 biolink:… Join… Joint stif… <chr>   <chr>    http… <chr> HP       
#>  9 HP:0002300 biolink:… Muti… Complete l… <chr>   <chr>    http… <chr> HP       
#> 10 HP:0002381 biolink:… Apha… An acquire… <chr>   <chr>    http… <chr> HP       
#> # ℹ 39 more rows
#> # ℹ 1 more variable: provided_by <list>
#> #
#> # Edge Data: 48 × 23
#>    from    to subject    predicate object primary_knowledge_so…¹ knowledge_level
#>   <int> <int> <chr>      <chr>     <chr>  <chr>                  <chr>          
#> 1     1     2 MONDO:000… biolink:… HP:00… infores:orphanet       knowledge_asse…
#> 2     1     3 MONDO:000… biolink:… HP:00… infores:omim           knowledge_asse…
#> 3     1     4 MONDO:000… biolink:… HP:00… infores:orphanet       knowledge_asse…
#> # ℹ 45 more rows
#> # ℹ abbreviated name: ¹​primary_knowledge_source
#> # ℹ 16 more variables: negated <lgl>, frequency_qualifier <chr>,
#> #   original_subject <chr>, agent_type <chr>, knowledge_source <chr>,
#> #   aggregator_knowledge_source <list>, has_evidence <list>,
#> #   provided_by <list>, id <chr>, category <list>, has_total <chr>,
#> #   has_quotient <chr>, has_count <chr>, has_percentage <chr>, …


## Using example KGX file packaged with monarchr
filename <- system.file("extdata", "eds_marfan_kg.tar.gz", package = "monarchr")
phenos <- file_engine(filename) |>
          fetch_nodes(query_ids = "MONDO:0007525") |>
          expand(predicates = "biolink:has_phenotype",
                 categories = "biolink:PhenotypicFeature")

print(phenos)
#> # A tbl_graph: 49 nodes and 48 edges
#> #
#> # A rooted tree
#> #
#> # Node Data: 49 × 16 (active)
#>    id         pcategory name  symbol in_taxon_label description synonym category
#>    <chr>      <chr>     <chr> <chr>  <chr>          <chr>       <list>  <list>  
#>  1 MONDO:000… biolink:… Ehle… NA     NA             Arthrochal… <chr>   <chr>   
#>  2 HP:0000974 biolink:… Hype… NA     NA             A conditio… <chr>   <chr>   
#>  3 HP:0001382 biolink:… Join… NA     NA             The abilit… <chr>   <chr>   
#>  4 HP:0000023 biolink:… Ingu… NA     NA             Protrusion… <chr>   <chr>   
#>  5 HP:0000963 biolink:… Thin… NA     NA             Reduction … <chr>   <chr>   
#>  6 HP:0000978 biolink:… Brui… NA     NA             An ecchymo… <chr>   <chr>   
#>  7 HP:0001027 biolink:… Soft… NA     NA             A skin tex… <chr>   <chr>   
#>  8 HP:0001058 biolink:… Poor… NA     NA             A reduced … <chr>   <chr>   
#>  9 HP:0001075 biolink:… Atro… NA     NA             Scars that… <chr>   <chr>   
#> 10 HP:0001373 biolink:… Join… NA     NA             Displaceme… <chr>   <chr>   
#> # ℹ 39 more rows
#> # ℹ 8 more variables: iri <chr>, xref <list>, namespace <chr>,
#> #   provided_by <chr>, in_taxon <chr>, full_name <chr>, type <list>,
#> #   has_gene <chr>
#> #
#> # Edge Data: 48 × 25
#>    from    to subject       predicate   object primary_knowledge_so…¹ agent_type
#>   <int> <int> <chr>         <chr>       <chr>  <chr>                  <chr>     
#> 1     1     8 MONDO:0007525 biolink:ha… HP:00… infores:hpo-annotatio… manual_ag…
#> 2     1    29 MONDO:0007525 biolink:ha… HP:00… infores:hpo-annotatio… manual_ag…
#> 3     1    20 MONDO:0007525 biolink:ha… HP:00… infores:hpo-annotatio… manual_ag…
#> # ℹ 45 more rows
#> # ℹ abbreviated name: ¹​primary_knowledge_source
#> # ℹ 18 more variables: knowledge_level <chr>, knowledge_source <chr>,
#> #   aggregator_knowledge_source <chr>, provided_by <chr>, id <chr>,
#> #   category <chr>, original_object <chr>, original_subject <chr>,
#> #   frequency_qualifier <chr>, has_evidence <chr>, has_total <dbl>,
#> #   has_quotient <dbl>, has_count <dbl>, has_percentage <dbl>, …



## Using MONDO KGX file (remote) as an example
phenos <- file_engine("https://kghub.io/kg-obo/mondo/2024-03-04/mondo_kgx_tsv.tar.gz") |>
          fetch_nodes(query_ids = "MONDO:0007525") |>
          expand(predicates = "biolink:has_phenotype",
                 categories = "biolink:PhenotypicFeature")

print(phenos)
#> # A tbl_graph: 1 nodes and 0 edges
#> #
#> # A rooted tree
#> #
#> # Node Data: 1 × 11 (active)
#>   id        pcategory name  description synonym category xref  provided_by iri  
#>   <chr>     <chr>     <chr> <chr>       <list>  <list>   <lis> <chr>       <chr>
#> 1 MONDO:00… biolink:… Ehle… Arthrochal… <chr>   <chr>    <chr> mondo.json  http…
#> # ℹ 2 more variables: same_as <list>, subsets <list>
#> #
#> # Edge Data: 0 × 5
#> # ℹ 5 variables: from <int>, to <int>, subject <chr>, predicate <chr>,
#> #   object <chr>

file.remove("mondo_kgx_tsv.tar.gz") # cleanup - remove the downloaded file
#> [1] TRUE