Creates a knowledge graph engine backed by a KGX-based tab-separated file. This must be a filename or URL to a .tar.gz file containing a *_nodes.tsv and *_edges.tsv file. If a URL is provided, the file will be downloaded to the user's current working directory.

file_engine(filename, preferences = NULL, ...)

Arguments

filename

A character string indicating the filename or URL of the KGX-based tsv file.

preferences

A named list of preferences for the engine.

...

Additional arguments (unused).

Value

An object of class file_engine

Details

Engines store preference information specifying how data are fetched and manipulated; for example, while node category is multi-valued (nodes may have multiple categories, for example "biolink:Gene" and "biolink:NamedThing"), typically a single category is used to represent the node in a graph, and is returned as the nodes' pcategory. A preference list of categories to use for pcategory is stored in the engine's preferences. A default set of preferences is stored in the package for use with KGX (BioLink-compatible) graphs (see https://github.com/biolink/kgx/blob/master/specification/kgx-format.md), but these can be overridden by the user.

Examples

library(tidygraph)
library(dplyr)

# Using example KGX file packaged with monarchr
filename <- system.file("extdata", "eds_marfan_kg.tar.gz", package = "monarchr")
engine <- file_engine(filename)

res <- engine |> fetch_nodes(query_ids = c("MONDO:0007522", "MONDO:0007947"))
print(res)
#> # A tbl_graph: 2 nodes and 0 edges
#> #
#> # A rooted forest with 2 trees
#> #
#> # Node Data: 2 × 16 (active)
#>   id    pcategory name  symbol in_taxon_label description synonym category iri  
#>   <chr> <chr>     <chr> <chr>  <chr>          <chr>       <list>  <list>   <chr>
#> 1 MOND… biolink:… Ehle… NA     NA             Ehlers-Dan… <chr>   <chr>    http…
#> 2 MOND… biolink:… Marf… NA     NA             A disorder… <chr>   <chr>    http…
#> # ℹ 7 more variables: xref <list>, namespace <chr>, provided_by <chr>,
#> #   in_taxon <chr>, full_name <chr>, type <list>, has_gene <chr>
#> #
#> # Edge Data: 0 × 25
#> # ℹ 25 variables: from <int>, to <int>, subject <chr>, predicate <chr>,
#> #   object <chr>, agent_type <chr>, knowledge_level <chr>,
#> #   knowledge_source <chr>, aggregator_knowledge_source <chr>,
#> #   primary_knowledge_source <chr>, provided_by <chr>, id <chr>,
#> #   category <chr>, original_object <chr>, original_subject <chr>,
#> #   frequency_qualifier <chr>, has_evidence <chr>, has_total <dbl>,
#> #   has_quotient <dbl>, has_count <dbl>, has_percentage <dbl>, …