This function fetches nodes (and no edges) from a knowledge graph engine based on a set of conditions or a set of identifiers. If query_ids is provided, the function will fetch nodes with the specified identifiers. If query_ids is NULL, the function will fetch nodes based on the conditions provided. Only a limited set of condition expressions are supported, see details.
fetch_nodes(engine, ..., query_ids = NULL, limit = NULL)
A tbl_kgx object containing the nodes
If query_ids is provided, the function will fetch nodes with the specified. If query_ids is NULL, the function will fetch nodes based on a condition expression. The following features are supported:
Matching node properties with boolean operators, e.g. in_taxon_label == "Homo sapiens"
.
Matching multi-valued properties with %in_list%
, e.g. "biolink:Gene" %in_list% category
. NOTE: using %in_list%
against vector queries, e.g. in_taxon_label %in_list% c("Homo sapiens", "Mus musculus")
is not supported. Nor does %in_list%
support multi-valued left hand sides; c("biolink:Disease", "biolink:Gene") %in_list% category
will not work.
Boolean connectives with |
, &
, and !
, e.g. in_taxon_lable == "Homo sapiens" | "biolink:Gene" %in_list% category
.
If more than one condition parameter is specified, they are combined with &
; for example,
fetch_nodes(engine, in_taxon_lable == "Homo sapiens", "biolink:Gene" %in_list% category)
is equivalent to
fetch_nodes(engine, in_taxon_lable == "Homo sapiens" & "biolink:Gene" %in_list% category)
.
library(tidygraph)
#>
#> Attaching package: ‘tidygraph’
#> The following object is masked from ‘package:stats’:
#>
#> filter
library(dplyr)
#>
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#>
#> filter, lag
#> The following objects are masked from ‘package:base’:
#>
#> intersect, setdiff, setequal, union
monarch_engine() |>
fetch_nodes(query_ids = c("MONDO:0007525", "MONDO:0007526"))
#> Trying to connect to https://neo4j.monarchinitiative.org
#> Connected to https://neo4j.monarchinitiative.org
#> Fetching; counting matching nodes...
#> total: 2.
#> Fetching; fetched 2 of 2
#> # A tbl_graph: 2 nodes and 0 edges
#> #
#> # A rooted forest with 2 trees
#> #
#> # Node Data: 2 × 10 (active)
#> id category pcategory name description synonym iri xref namespace
#> <chr> <list> <chr> <chr> <chr> <named> <chr> <nam> <chr>
#> 1 MONDO:0007… <chr> biolink:… Ehle… Arthrochal… <chr> http… <chr> MONDO
#> 2 MONDO:0007… <chr> biolink:… Ehle… A form of … <chr> http… <chr> MONDO
#> # ℹ 1 more variable: provided_by <named list>
#> #
#> # Edge Data: 0 × 5
#> # ℹ 5 variables: from <int>, to <int>, subject <chr>, predicate <chr>,
#> # object <chr>
# a large query
monarch_engine() |>
fetch_nodes("biolink:Disease" %in_list% category)
#> Trying to connect to https://neo4j.monarchinitiative.org
#> Connected to https://neo4j.monarchinitiative.org
#> Fetching; counting matching nodes...
#> total: 29990.
#> Fetching; fetched 5000 of 29990
#> Fetching; fetched 10000 of 29990
#> Fetching; fetched 15000 of 29990
#> Fetching; fetched 20000 of 29990
#> Fetching; fetched 25000 of 29990
#> Fetching; fetched 29990 of 29990
#> # A tbl_graph: 29990 nodes and 0 edges
#> #
#> # A rooted forest with 29990 trees
#> #
#> # Node Data: 29,990 × 11 (active)
#> id category pcategory name description synonym iri xref namespace
#> <chr> <list> <chr> <chr> <chr> <named> <chr> <nam> <chr>
#> 1 MONDO:000… <chr> biolink:… dise… A disease … <chr> http… <chr> MONDO
#> 2 MONDO:000… <chr> biolink:… obso… NA <lgl> http… <lgl> MONDO
#> 3 MONDO:000… <chr> biolink:… obso… NA <lgl> http… <lgl> MONDO
#> 4 MONDO:000… <chr> biolink:… adre… An endocri… <chr> http… <chr> MONDO
#> 5 MONDO:000… <chr> biolink:… alop… NA <lgl> http… <chr> MONDO
#> 6 MONDO:000… <chr> biolink:… obso… NA <lgl> http… <lgl> MONDO
#> 7 MONDO:000… <chr> biolink:… obso… NA <lgl> http… <lgl> MONDO
#> 8 MONDO:000… <chr> biolink:… obso… NA <lgl> http… <lgl> MONDO
#> 9 MONDO:000… <chr> biolink:… inhe… NA <chr> http… <chr> MONDO
#> 10 MONDO:000… <chr> biolink:… obso… NA <lgl> http… <lgl> MONDO
#> # ℹ 29,980 more rows
#> # ℹ 2 more variables: provided_by <named list>, deprecated <chr>
#> #
#> # Edge Data: 0 × 5
#> # ℹ 5 variables: from <int>, to <int>, subject <chr>, predicate <chr>,
#> # object <chr>
# file_engine supports the same features as neo4j_engine
# (using the example KGX file packaged with monarchr)
filename <- system.file("extdata", "eds_marfan_kg.tar.gz", package = "monarchr")
file_engine(filename) |>
fetch_nodes(query_ids = c("MONDO:0007525", "MONDO:0007526"))
#> # A tbl_graph: 2 nodes and 0 edges
#> #
#> # A rooted forest with 2 trees
#> #
#> # Node Data: 2 × 16 (active)
#> id pcategory name symbol in_taxon_label description synonym category iri
#> <chr> <chr> <chr> <chr> <chr> <chr> <list> <list> <chr>
#> 1 MOND… biolink:… Ehle… NA NA Arthrochal… <chr> <chr> http…
#> 2 MOND… biolink:… Ehle… NA NA A form of … <chr> <chr> http…
#> # ℹ 7 more variables: xref <list>, namespace <chr>, provided_by <chr>,
#> # in_taxon <chr>, full_name <chr>, type <list>, has_gene <chr>
#> #
#> # Edge Data: 0 × 25
#> # ℹ 25 variables: from <int>, to <int>, subject <chr>, predicate <chr>,
#> # object <chr>, agent_type <chr>, knowledge_level <chr>,
#> # knowledge_source <chr>, aggregator_knowledge_source <chr>,
#> # primary_knowledge_source <chr>, provided_by <chr>, id <chr>,
#> # category <chr>, original_object <chr>, original_subject <chr>,
#> # frequency_qualifier <chr>, has_evidence <chr>, has_total <dbl>,
#> # has_quotient <dbl>, has_count <dbl>, has_percentage <dbl>, …
# grab all Homo sapiens genes
file_engine(filename) |>
fetch_nodes(in_taxon_label == "Homo sapiens" & "biolink:Gene" %in_list% category)
#> # A tbl_graph: 23 nodes and 0 edges
#> #
#> # A rooted forest with 23 trees
#> #
#> # Node Data: 23 × 16 (active)
#> id pcategory name symbol in_taxon_label description synonym category
#> <chr> <chr> <chr> <chr> <chr> <chr> <list> <list>
#> 1 HGNC:11976 biolink:… TNXB TNXB Homo sapiens NA <chr> <chr>
#> 2 HGNC:1246 biolink:… C1R C1R Homo sapiens NA <chr> <chr>
#> 3 HGNC:1247 biolink:… C1S C1S Homo sapiens NA <chr> <chr>
#> 4 HGNC:14631 biolink:… ADAM… ADAMT… Homo sapiens NA <chr> <chr>
#> 5 HGNC:17978 biolink:… B3GA… B3GAL… Homo sapiens NA <chr> <chr>
#> 6 HGNC:18625 biolink:… FKBP… FKBP14 Homo sapiens NA <chr> <chr>
#> 7 HGNC:20859 biolink:… SLC3… SLC39… Homo sapiens NA <chr> <chr>
#> 8 HGNC:21144 biolink:… DSE DSE Homo sapiens NA <chr> <chr>
#> 9 HGNC:218 biolink:… ADAM… ADAMT… Homo sapiens NA <chr> <chr>
#> 10 HGNC:2188 biolink:… COL1… COL12… Homo sapiens NA <chr> <chr>
#> # ℹ 13 more rows
#> # ℹ 8 more variables: iri <chr>, xref <list>, namespace <chr>,
#> # provided_by <chr>, in_taxon <chr>, full_name <chr>, type <list>,
#> # has_gene <chr>
#> #
#> # Edge Data: 0 × 25
#> # ℹ 25 variables: from <int>, to <int>, subject <chr>, predicate <chr>,
#> # object <chr>, agent_type <chr>, knowledge_level <chr>,
#> # knowledge_source <chr>, aggregator_knowledge_source <chr>,
#> # primary_knowledge_source <chr>, provided_by <chr>, id <chr>,
#> # category <chr>, original_object <chr>, original_subject <chr>,
#> # frequency_qualifier <chr>, has_evidence <chr>, has_total <dbl>,
#> # has_quotient <dbl>, has_count <dbl>, has_percentage <dbl>, …