Given an optional KG engine (e.g. a file_engine(), neo4j_engine(), or monarch_engine()) and a query tbl_kgx() graph, fetches additional nodes and edges from the KG, expanding the query graph according to specific criteria. If the first parameter is an engine, that engine is used; if the first parameter is a query graph, the most recent engine associated with the graph is used.

expand(
  graph,
  engine = NULL,
  direction = "both",
  predicates = NULL,
  categories = NULL,
  transitive = FALSE,
  drop_unused_query_nodes = FALSE,
  ...
)

Arguments

graph

A query tbl_kgx() graph ot query with.

engine

(Optional) An engine to use for fetching query graph edges. If not provided, the graph's most recent engine is used.

direction

The direction of associations to fetch. Can be "in", "out", or "both". Default is "both".

predicates

A vector of relationship predicates (nodes in g are subjects in the KG), indicating which edges to consider in the neighborhood. If NULL (default), all edges are considered.

categories

A vector of node categories, indicating which nodes in the larger KG may be fetched. If NULL (default), all nodes in the larger KG are will be fetched.

transitive

If TRUE, include transitive closure of the neighborhood. Default is FALSE. Useful in combination with predicates like biolink:subclass_of.

drop_unused_query_nodes

If TRUE, remove query nodes from the result, unless they are at the neighborhood boundary, i.e., required for connecting to the result nodes. Default is FALSE.

...

Other parameters passed to methods.

Value

A tbl_kgx() graph

Examples

## Using Monarch (hosted)
phenos <- monarch_engine() |>
          fetch_nodes(query_ids = "MONDO:0007525") |>
          expand(predicates = "biolink:has_phenotype",
                 categories = "biolink:PhenotypicFeature")
#> Trying to connect to https://neo4j.monarchinitiative.org
#> Connected to https://neo4j.monarchinitiative.org
#> Fetching; counting matching nodes... 
#>  total: 1.
#> Fetching; fetched 1 of 1
#> Expanding; counting matching edges... 
#>  total: 48.
#> Expanding; fetched 48 of 48 edges.

print(phenos)
#> # A tbl_graph: 49 nodes and 48 edges
#> #
#> # A rooted tree
#> #
#> # Node Data: 49 × 10 (active)
#>    id         pcategory name  description synonym category iri   xref  namespace
#>    <chr>      <chr>     <chr> <chr>       <named> <list>   <chr> <nam> <chr>    
#>  1 MONDO:000… biolink:… Ehle… Arthrochal… <chr>   <chr>    http… <chr> MONDO    
#>  2 HP:0000963 biolink:… Thin… Reduction … <chr>   <chr>    http… <chr> HP       
#>  3 HP:0000974 biolink:… Hype… A conditio… <chr>   <chr>    http… <chr> HP       
#>  4 HP:0001001 biolink:… Abno… NA          <chr>   <chr>    http… <chr> HP       
#>  5 HP:0001252 biolink:… Hypo… Hypotonia … <chr>   <chr>    http… <chr> HP       
#>  6 HP:0001373 biolink:… Join… Displaceme… <chr>   <chr>    http… <chr> HP       
#>  7 HP:0001385 biolink:… Hip … The presen… <chr>   <chr>    http… <chr> HP       
#>  8 HP:0001387 biolink:… Join… Joint stif… <chr>   <chr>    http… <chr> HP       
#>  9 HP:0002300 biolink:… Muti… Complete l… <chr>   <chr>    http… <chr> HP       
#> 10 HP:0002381 biolink:… Apha… An acquire… <chr>   <chr>    http… <chr> HP       
#> # ℹ 39 more rows
#> # ℹ 1 more variable: provided_by <named list>
#> #
#> # Edge Data: 48 × 23
#>    from    to subject       predicate             object knowledge_level negated
#>   <int> <int> <chr>         <chr>                 <chr>  <chr>           <lgl>  
#> 1     1     2 MONDO:0007525 biolink:has_phenotype HP:00… knowledge_asse… TRUE   
#> 2     1     3 MONDO:0007525 biolink:has_phenotype HP:00… knowledge_asse… TRUE   
#> 3     1     4 MONDO:0007525 biolink:has_phenotype HP:00… knowledge_asse… TRUE   
#> # ℹ 45 more rows
#> # ℹ 16 more variables: primary_knowledge_source <chr>,
#> #   frequency_qualifier <chr>, original_subject <chr>, agent_type <chr>,
#> #   knowledge_source <chr>, aggregator_knowledge_source <named list>,
#> #   has_evidence <named list>, provided_by <named list>, id <chr>,
#> #   category <named list>, has_total <chr>, has_quotient <chr>,
#> #   has_count <chr>, has_percentage <chr>, publications <named list>, …


## Using example KGX file packaged with monarchr
filename <- system.file("extdata", "eds_marfan_kg.tar.gz", package = "monarchr")
phenos <- file_engine(filename) |>
          fetch_nodes(query_ids = "MONDO:0007525") |>
          expand(predicates = "biolink:has_phenotype",
                 categories = "biolink:PhenotypicFeature")

print(phenos)
#> # A tbl_graph: 49 nodes and 48 edges
#> #
#> # A rooted tree
#> #
#> # Node Data: 49 × 16 (active)
#>    id         pcategory name  symbol in_taxon_label description synonym category
#>    <chr>      <chr>     <chr> <chr>  <chr>          <chr>       <list>  <list>  
#>  1 MONDO:000… biolink:… Ehle… NA     NA             Arthrochal… <chr>   <chr>   
#>  2 HP:0000974 biolink:… Hype… NA     NA             A conditio… <chr>   <chr>   
#>  3 HP:0001382 biolink:… Join… NA     NA             The abilit… <chr>   <chr>   
#>  4 HP:0000023 biolink:… Ingu… NA     NA             Protrusion… <chr>   <chr>   
#>  5 HP:0000963 biolink:… Thin… NA     NA             Reduction … <chr>   <chr>   
#>  6 HP:0000978 biolink:… Brui… NA     NA             An ecchymo… <chr>   <chr>   
#>  7 HP:0001027 biolink:… Soft… NA     NA             A skin tex… <chr>   <chr>   
#>  8 HP:0001058 biolink:… Poor… NA     NA             A reduced … <chr>   <chr>   
#>  9 HP:0001075 biolink:… Atro… NA     NA             Scars that… <chr>   <chr>   
#> 10 HP:0001373 biolink:… Join… NA     NA             Displaceme… <chr>   <chr>   
#> # ℹ 39 more rows
#> # ℹ 8 more variables: iri <chr>, xref <list>, namespace <chr>,
#> #   provided_by <chr>, in_taxon <chr>, full_name <chr>, type <list>,
#> #   has_gene <chr>
#> #
#> # Edge Data: 48 × 25
#>    from    to subject       predicate          object agent_type knowledge_level
#>   <int> <int> <chr>         <chr>              <chr>  <chr>      <chr>          
#> 1     1     8 MONDO:0007525 biolink:has_pheno… HP:00… manual_ag… knowledge_asse…
#> 2     1    29 MONDO:0007525 biolink:has_pheno… HP:00… manual_ag… knowledge_asse…
#> 3     1    20 MONDO:0007525 biolink:has_pheno… HP:00… manual_ag… knowledge_asse…
#> # ℹ 45 more rows
#> # ℹ 18 more variables: knowledge_source <chr>,
#> #   aggregator_knowledge_source <chr>, primary_knowledge_source <chr>,
#> #   provided_by <chr>, id <chr>, category <chr>, original_object <chr>,
#> #   original_subject <chr>, frequency_qualifier <chr>, has_evidence <chr>,
#> #   has_total <dbl>, has_quotient <dbl>, has_count <dbl>, has_percentage <dbl>,
#> #   onset_qualifier <chr>, publications <chr>, qualifiers <chr>, …



## Using MONDO KGX file (remote) as an example
phenos <- file_engine("https://kghub.io/kg-obo/mondo/2024-03-04/mondo_kgx_tsv.tar.gz") |>
          fetch_nodes(query_ids = "MONDO:0007525") |>
          expand(predicates = "biolink:has_phenotype",
                 categories = "biolink:PhenotypicFeature")

print(phenos)
#> # A tbl_graph: 1 nodes and 0 edges
#> #
#> # A rooted tree
#> #
#> # Node Data: 1 × 11 (active)
#>   id        pcategory name  description synonym category xref  provided_by iri  
#>   <chr>     <chr>     <chr> <chr>       <list>  <list>   <lis> <chr>       <chr>
#> 1 MONDO:00… biolink:… Ehle… Arthrochal… <chr>   <chr>    <chr> mondo.json  http…
#> # ℹ 2 more variables: same_as <list>, subsets <list>
#> #
#> # Edge Data: 0 × 5
#> # ℹ 5 variables: from <int>, to <int>, subject <chr>, predicate <chr>,
#> #   object <chr>

file.remove("mondo_kgx_tsv.tar.gz") # cleanup - remove the downloaded file
#> [1] TRUE