R/file_engine.R
file_engine.Rd
Creates a knowledge graph engine backed by a KGX-based tab-separated file. This must be a filename or URL to a .tar.gz
file containing a *_nodes.tsv
and *_edges.tsv
file. If a URL is provided, the file will be downloaded to the user's current working directory.
file_engine(filename, preferences = NULL, ...)
An object of class file_engine
Engines store preference information specifying how data are fetched and manipulated; for example,
while node category
is multi-valued (nodes may have multiple categories, for example "biolink:Gene" and "biolink:NamedThing"),
typically a single category is used to represent the node in a graph, and is returned as the nodes' pcategory
. A preference list of categories to use for pcategory
is
stored in the engine's preferences. A default set of preferences is stored in the package for use with KGX (BioLink-compatible) graphs (see https://github.com/biolink/kgx/blob/master/specification/kgx-format.md),
but these can be overridden by the user.
library(tidygraph)
library(dplyr)
# Using example KGX file packaged with monarchr
filename <- system.file("extdata", "eds_marfan_kg.tar.gz", package = "monarchr")
engine <- file_engine(filename)
res <- engine |> fetch_nodes(query_ids = c("MONDO:0007522", "MONDO:0007947"))
print(res)
#> # A tbl_graph: 2 nodes and 0 edges
#> #
#> # A rooted forest with 2 trees
#> #
#> # Node Data: 2 × 16 (active)
#> id pcategory name symbol in_taxon_label description synonym category iri
#> <chr> <chr> <chr> <chr> <chr> <chr> <list> <list> <chr>
#> 1 MOND… biolink:… Ehle… NA NA Ehlers-Dan… <chr> <chr> http…
#> 2 MOND… biolink:… Marf… NA NA A disorder… <chr> <chr> http…
#> # ℹ 7 more variables: xref <list>, namespace <chr>, provided_by <chr>,
#> # in_taxon <chr>, full_name <chr>, type <list>, has_gene <chr>
#> #
#> # Edge Data: 0 × 25
#> # ℹ 25 variables: from <int>, to <int>, subject <chr>, predicate <chr>,
#> # object <chr>, agent_type <chr>, knowledge_level <chr>,
#> # knowledge_source <chr>, aggregator_knowledge_source <chr>,
#> # primary_knowledge_source <chr>, provided_by <chr>, id <chr>,
#> # category <chr>, original_object <chr>, original_subject <chr>,
#> # frequency_qualifier <chr>, has_evidence <chr>, has_total <dbl>,
#> # has_quotient <dbl>, has_count <dbl>, has_percentage <dbl>, …