Given two KGX graphs, returns a new KGX graph that is the union of the two input graphs,
with any edges between nodes repeated for aother nodes with the same subject and object id
.
The engine of the first graph is used for the new graph.
kg_join(graph1, graph2, ...)
A tbl_kgx()
graph
This function first computes new node and edge data, by taking the full natural join of
node and edge data from the two input graphs, and then keeping unique rows. Note that nodes with
the same id
that differ in any shared column are effectively kept as separate, taken
to represent the same entity in different contexts. (However, a node with an additional property will be
merged with a node without that property, as defined by the natural join.) In these
cases, any edge that connects to one of these nodes is also valid for the other node, and so the
method repeats edges across nodes with the same id
.
## Using example KGX file packaged with monarchr
filename <- system.file("extdata", "eds_marfan_kg.tar.gz", package = "monarchr")
engine <- file_engine(filename)
eds_and_phenos <- engine |>
fetch_nodes(query_ids = "MONDO:0007525") |>
expand(predicates = "biolink:has_phenotype",
categories = "biolink:PhenotypicFeature")
marfan_and_phenos <- engine |>
fetch_nodes(query_ids = "MONDO:0007947") |>
expand(predicates = "biolink:has_phenotype",
categories = "biolink:PhenotypicFeature")
combined <- kg_join(eds_and_phenos, marfan_and_phenos)
#> Joining with `by = join_by(id, pcategory, name, symbol, in_taxon_label,
#> description, synonym, category, iri, xref, namespace, provided_by, in_taxon,
#> full_name, type, has_gene)`
#> Joining with `by = join_by(subject, predicate, object, agent_type,
#> knowledge_level, knowledge_source, aggregator_knowledge_source,
#> primary_knowledge_source, provided_by, id, category, original_object,
#> original_subject, frequency_qualifier, has_evidence, has_total, has_quotient,
#> has_count, has_percentage, onset_qualifier, publications, qualifiers,
#> original_predicate)`
print(combined)
#> # A tbl_graph: 143 nodes and 152 edges
#> #
#> # A directed acyclic simple graph with 1 component
#> #
#> # Node Data: 143 × 16 (active)
#> id pcategory name symbol in_taxon_label description synonym category
#> <chr> <chr> <chr> <chr> <chr> <chr> <list> <list>
#> 1 MONDO:000… biolink:… Ehle… NA NA Arthrochal… <chr> <chr>
#> 2 HP:0000974 biolink:… Hype… NA NA A conditio… <chr> <chr>
#> 3 HP:0001382 biolink:… Join… NA NA The abilit… <chr> <chr>
#> 4 HP:0000023 biolink:… Ingu… NA NA Protrusion… <chr> <chr>
#> 5 HP:0000963 biolink:… Thin… NA NA Reduction … <chr> <chr>
#> 6 HP:0000978 biolink:… Brui… NA NA An ecchymo… <chr> <chr>
#> 7 HP:0001027 biolink:… Soft… NA NA A skin tex… <chr> <chr>
#> 8 HP:0001058 biolink:… Poor… NA NA A reduced … <chr> <chr>
#> 9 HP:0001075 biolink:… Atro… NA NA Scars that… <chr> <chr>
#> 10 HP:0001373 biolink:… Join… NA NA Displaceme… <chr> <chr>
#> # ℹ 133 more rows
#> # ℹ 8 more variables: iri <chr>, xref <list>, namespace <chr>,
#> # provided_by <chr>, in_taxon <chr>, full_name <chr>, type <list>,
#> # has_gene <chr>
#> #
#> # Edge Data: 152 × 25
#> from to subject predicate object agent_type knowledge_level
#> <int> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 1 8 MONDO:0007525 biolink:has_pheno… HP:00… manual_ag… knowledge_asse…
#> 2 1 29 MONDO:0007525 biolink:has_pheno… HP:00… manual_ag… knowledge_asse…
#> 3 1 20 MONDO:0007525 biolink:has_pheno… HP:00… manual_ag… knowledge_asse…
#> # ℹ 149 more rows
#> # ℹ 18 more variables: knowledge_source <chr>,
#> # aggregator_knowledge_source <chr>, primary_knowledge_source <chr>,
#> # provided_by <chr>, id <chr>, category <chr>, original_object <chr>,
#> # original_subject <chr>, frequency_qualifier <chr>, has_evidence <chr>,
#> # has_total <dbl>, has_quotient <dbl>, has_count <dbl>, has_percentage <dbl>,
#> # onset_qualifier <chr>, publications <chr>, qualifiers <chr>, …