This function fetches nodes (and no edges) from a knowledge graph engine based on a set of conditions or a set of identifiers. If query_ids is provided, the function will fetch nodes with the specified identifiers. If query_ids is NULL, the function will fetch nodes based on the conditions provided. Only a limited set of condition expressions are supported, see details.

fetch_nodes(engine, ..., query_ids = NULL, limit = NULL)

Arguments

engine

A graph engine object

...

A set of conditions identifying the nodes to fetch, only used if query_ids is NULL

query_ids

A character vector of identifiers to fetch

limit

An integer specifying the maximum number of nodes to fetch. Default to NULL, no limit.

Value

A tbl_kgx object containing the nodes

Details

If query_ids is provided, the function will fetch nodes with the specified. If query_ids is NULL, the function will fetch nodes based on a condition expression. The following features are supported:

  • Matching node properties with boolean operators, e.g. in_taxon_label == "Homo sapiens".

  • Matching multi-valued properties with %in_list%, e.g. "biolink:Gene" %in_list% category. NOTE: using %in_list% against vector queries, e.g. in_taxon_label %in_list% c("Homo sapiens", "Mus musculus") is not supported. Nor does %in_list% support multi-valued left hand sides; c("biolink:Disease", "biolink:Gene") %in_list% category will not work.

  • Boolean connectives with |, &, and !, e.g. in_taxon_lable == "Homo sapiens" | "biolink:Gene" %in_list% category.

If more than one condition parameter is specified, they are combined with &; for example, fetch_nodes(engine, in_taxon_lable == "Homo sapiens", "biolink:Gene" %in_list% category) is equivalent to fetch_nodes(engine, in_taxon_lable == "Homo sapiens" & "biolink:Gene" %in_list% category).

Examples

library(tidygraph)
#> 
#> Attaching package: ‘tidygraph’
#> The following object is masked from ‘package:stats’:
#> 
#>     filter
library(dplyr)
#> 
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#> 
#>     filter, lag
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, setdiff, setequal, union

monarch_engine() |>
  fetch_nodes(query_ids = c("MONDO:0007525", "MONDO:0007526"))
#> Trying to connect to https://neo4j.monarchinitiative.org
#> Connected to https://neo4j.monarchinitiative.org
#> Fetching; counting matching nodes... 
#>  total: 2.
#> Fetching; fetched 2 of 2
#> # A tbl_graph: 2 nodes and 0 edges
#> #
#> # A rooted forest with 2 trees
#> #
#> # Node Data: 2 × 10 (active)
#>   id          category pcategory name  description synonym iri   xref  namespace
#>   <chr>       <list>   <chr>     <chr> <chr>       <named> <chr> <nam> <chr>    
#> 1 MONDO:0007… <chr>    biolink:… Ehle… Arthrochal… <chr>   http… <chr> MONDO    
#> 2 MONDO:0007… <chr>    biolink:… Ehle… A form of … <chr>   http… <chr> MONDO    
#> # ℹ 1 more variable: provided_by <named list>
#> #
#> # Edge Data: 0 × 5
#> # ℹ 5 variables: from <int>, to <int>, subject <chr>, predicate <chr>,
#> #   object <chr>

# a large query
monarch_engine() |>
  fetch_nodes("biolink:Disease" %in_list% category)
#> Trying to connect to https://neo4j.monarchinitiative.org
#> Connected to https://neo4j.monarchinitiative.org
#> Fetching; counting matching nodes... 
#>  total: 29990.
#> Fetching; fetched 5000 of 29990
#> Fetching; fetched 10000 of 29990
#> Fetching; fetched 15000 of 29990
#> Fetching; fetched 20000 of 29990
#> Fetching; fetched 25000 of 29990
#> Fetching; fetched 29990 of 29990
#> # A tbl_graph: 29990 nodes and 0 edges
#> #
#> # A rooted forest with 29990 trees
#> #
#> # Node Data: 29,990 × 11 (active)
#>    id         category pcategory name  description synonym iri   xref  namespace
#>    <chr>      <list>   <chr>     <chr> <chr>       <named> <chr> <nam> <chr>    
#>  1 MONDO:000… <chr>    biolink:… dise… A disease … <chr>   http… <chr> MONDO    
#>  2 MONDO:000… <chr>    biolink:… obso… NA          <lgl>   http… <lgl> MONDO    
#>  3 MONDO:000… <chr>    biolink:… obso… NA          <lgl>   http… <lgl> MONDO    
#>  4 MONDO:000… <chr>    biolink:… adre… An endocri… <chr>   http… <chr> MONDO    
#>  5 MONDO:000… <chr>    biolink:… alop… NA          <lgl>   http… <chr> MONDO    
#>  6 MONDO:000… <chr>    biolink:… obso… NA          <lgl>   http… <lgl> MONDO    
#>  7 MONDO:000… <chr>    biolink:… obso… NA          <lgl>   http… <lgl> MONDO    
#>  8 MONDO:000… <chr>    biolink:… obso… NA          <lgl>   http… <lgl> MONDO    
#>  9 MONDO:000… <chr>    biolink:… inhe… NA          <chr>   http… <chr> MONDO    
#> 10 MONDO:000… <chr>    biolink:… obso… NA          <lgl>   http… <lgl> MONDO    
#> # ℹ 29,980 more rows
#> # ℹ 2 more variables: provided_by <named list>, deprecated <chr>
#> #
#> # Edge Data: 0 × 5
#> # ℹ 5 variables: from <int>, to <int>, subject <chr>, predicate <chr>,
#> #   object <chr>
# file_engine supports the same features as neo4j_engine
# (using the example KGX file packaged with monarchr)
filename <- system.file("extdata", "eds_marfan_kg.tar.gz", package = "monarchr")

file_engine(filename) |>
  fetch_nodes(query_ids = c("MONDO:0007525", "MONDO:0007526"))
#> # A tbl_graph: 2 nodes and 0 edges
#> #
#> # A rooted forest with 2 trees
#> #
#> # Node Data: 2 × 16 (active)
#>   id    pcategory name  symbol in_taxon_label description synonym category iri  
#>   <chr> <chr>     <chr> <chr>  <chr>          <chr>       <list>  <list>   <chr>
#> 1 MOND… biolink:… Ehle… NA     NA             Arthrochal… <chr>   <chr>    http…
#> 2 MOND… biolink:… Ehle… NA     NA             A form of … <chr>   <chr>    http…
#> # ℹ 7 more variables: xref <list>, namespace <chr>, provided_by <chr>,
#> #   in_taxon <chr>, full_name <chr>, type <list>, has_gene <chr>
#> #
#> # Edge Data: 0 × 25
#> # ℹ 25 variables: from <int>, to <int>, subject <chr>, predicate <chr>,
#> #   object <chr>, agent_type <chr>, knowledge_level <chr>,
#> #   knowledge_source <chr>, aggregator_knowledge_source <chr>,
#> #   primary_knowledge_source <chr>, provided_by <chr>, id <chr>,
#> #   category <chr>, original_object <chr>, original_subject <chr>,
#> #   frequency_qualifier <chr>, has_evidence <chr>, has_total <dbl>,
#> #   has_quotient <dbl>, has_count <dbl>, has_percentage <dbl>, …

# grab all Homo sapiens genes
file_engine(filename) |>
  fetch_nodes(in_taxon_label == "Homo sapiens" & "biolink:Gene" %in_list% category)
#> # A tbl_graph: 23 nodes and 0 edges
#> #
#> # A rooted forest with 23 trees
#> #
#> # Node Data: 23 × 16 (active)
#>    id         pcategory name  symbol in_taxon_label description synonym category
#>    <chr>      <chr>     <chr> <chr>  <chr>          <chr>       <list>  <list>  
#>  1 HGNC:11976 biolink:… TNXB  TNXB   Homo sapiens   NA          <chr>   <chr>   
#>  2 HGNC:1246  biolink:… C1R   C1R    Homo sapiens   NA          <chr>   <chr>   
#>  3 HGNC:1247  biolink:… C1S   C1S    Homo sapiens   NA          <chr>   <chr>   
#>  4 HGNC:14631 biolink:… ADAM… ADAMT… Homo sapiens   NA          <chr>   <chr>   
#>  5 HGNC:17978 biolink:… B3GA… B3GAL… Homo sapiens   NA          <chr>   <chr>   
#>  6 HGNC:18625 biolink:… FKBP… FKBP14 Homo sapiens   NA          <chr>   <chr>   
#>  7 HGNC:20859 biolink:… SLC3… SLC39… Homo sapiens   NA          <chr>   <chr>   
#>  8 HGNC:21144 biolink:… DSE   DSE    Homo sapiens   NA          <chr>   <chr>   
#>  9 HGNC:218   biolink:… ADAM… ADAMT… Homo sapiens   NA          <chr>   <chr>   
#> 10 HGNC:2188  biolink:… COL1… COL12… Homo sapiens   NA          <chr>   <chr>   
#> # ℹ 13 more rows
#> # ℹ 8 more variables: iri <chr>, xref <list>, namespace <chr>,
#> #   provided_by <chr>, in_taxon <chr>, full_name <chr>, type <list>,
#> #   has_gene <chr>
#> #
#> # Edge Data: 0 × 25
#> # ℹ 25 variables: from <int>, to <int>, subject <chr>, predicate <chr>,
#> #   object <chr>, agent_type <chr>, knowledge_level <chr>,
#> #   knowledge_source <chr>, aggregator_knowledge_source <chr>,
#> #   primary_knowledge_source <chr>, provided_by <chr>, id <chr>,
#> #   category <chr>, original_object <chr>, original_subject <chr>,
#> #   frequency_qualifier <chr>, has_evidence <chr>, has_total <dbl>,
#> #   has_quotient <dbl>, has_count <dbl>, has_percentage <dbl>, …