Join two KGX graphs by their nodes and edges.

Given two KGX graphs, returns a new KGX graph that is the union of the two input graphs, with any edges between nodes repeated for aother nodes with the same subject and object id. The engine of the first graph is used for the new graph.

kg_join(graph1, graph2, ...)

Arguments

graph1: A tbl_kgx() graph.
graph2: A tbl_kgx() graph.
...: Other parameters (not used)

Value

A tbl_kgx() graph

Details

This function first computes new node and edge data, by taking the full natural join of node and edge data from the two input graphs, and then keeping unique rows. Note that nodes with the same id that differ in any shared column are effectively kept as separate, taken to represent the same entity in different contexts. (However, a node with an additional property will be merged with a node without that property, as defined by the natural join.) In these cases, any edge that connects to one of these nodes is also valid for the other node, and so the method repeats edges across nodes with the same id.

Examples

## Using example KGX file packaged with monarchr
filename <- system.file("extdata", "eds_marfan_kg.tar.gz", package = "monarchr")
engine <- file_engine(filename)

eds_and_phenos <- engine |>
                  fetch_nodes(query_ids = "MONDO:0007525") |>
                  expand(predicates = "biolink:has_phenotype",
                         categories = "biolink:PhenotypicFeature")

marfan_and_phenos <- engine |>
                     fetch_nodes(query_ids = "MONDO:0007947") |>
                     expand(predicates = "biolink:has_phenotype",
                            categories = "biolink:PhenotypicFeature")

combined <- kg_join(eds_and_phenos, marfan_and_phenos)
#> Joining with `by = join_by(id, pcategory, name, symbol, in_taxon_label,
#> description, synonym, category, iri, xref, namespace, provided_by, in_taxon,
#> full_name, type, has_gene)`
#> Joining with `by = join_by(subject, predicate, object,
#> primary_knowledge_source, agent_type, knowledge_level, knowledge_source,
#> aggregator_knowledge_source, provided_by, id, category, original_object,
#> original_subject, frequency_qualifier, has_evidence, has_total, has_quotient,
#> has_count, has_percentage, onset_qualifier, publications, qualifiers,
#> original_predicate)`
print(combined)
#> # A tbl_graph: 143 nodes and 152 edges
#> #
#> # A directed acyclic simple graph with 1 component
#> #
#> # Node Data: 143 × 16 (active)
#>    id         pcategory name  symbol in_taxon_label description synonym category
#>    <chr>      <chr>     <chr> <chr>  <chr>          <chr>       <list>  <list>  
#>  1 MONDO:000… biolink:… Ehle… NA     NA             Arthrochal… <chr>   <chr>   
#>  2 HP:0000974 biolink:… Hype… NA     NA             A conditio… <chr>   <chr>   
#>  3 HP:0001382 biolink:… Join… NA     NA             The abilit… <chr>   <chr>   
#>  4 HP:0000023 biolink:… Ingu… NA     NA             Protrusion… <chr>   <chr>   
#>  5 HP:0000963 biolink:… Thin… NA     NA             Reduction … <chr>   <chr>   
#>  6 HP:0000978 biolink:… Brui… NA     NA             An ecchymo… <chr>   <chr>   
#>  7 HP:0001027 biolink:… Soft… NA     NA             A skin tex… <chr>   <chr>   
#>  8 HP:0001058 biolink:… Poor… NA     NA             A reduced … <chr>   <chr>   
#>  9 HP:0001075 biolink:… Atro… NA     NA             Scars that… <chr>   <chr>   
#> 10 HP:0001373 biolink:… Join… NA     NA             Displaceme… <chr>   <chr>   
#> # ℹ 133 more rows
#> # ℹ 8 more variables: iri <chr>, xref <list>, namespace <chr>,
#> #   provided_by <chr>, in_taxon <chr>, full_name <chr>, type <list>,
#> #   has_gene <chr>
#> #
#> # Edge Data: 152 × 25
#>    from    to subject       predicate   object primary_knowledge_so…¹ agent_type
#>   <int> <int> <chr>         <chr>       <chr>  <chr>                  <chr>     
#> 1     1     8 MONDO:0007525 biolink:ha… HP:00… infores:hpo-annotatio… manual_ag…
#> 2     1    29 MONDO:0007525 biolink:ha… HP:00… infores:hpo-annotatio… manual_ag…
#> 3     1    20 MONDO:0007525 biolink:ha… HP:00… infores:hpo-annotatio… manual_ag…
#> # ℹ 149 more rows
#> # ℹ abbreviated name: ¹primary_knowledge_source
#> # ℹ 18 more variables: knowledge_level <chr>, knowledge_source <chr>,
#> #   aggregator_knowledge_source <chr>, provided_by <chr>, id <chr>,
#> #   category <chr>, original_object <chr>, original_subject <chr>,
#> #   frequency_qualifier <chr>, has_evidence <chr>, has_total <dbl>,
#> #   has_quotient <dbl>, has_count <dbl>, has_percentage <dbl>, …