R/summary.file_engine.R
summary.file_engine.RdGiven a KGX file-based KG engine, provides summary information in the form of
node counts, category counts across nodes, relationship type counts, and
available properties.
The returned summary object prints a readable console report and also
contains
data frames with this information. Also returned are cats, preds, and
props entries, containing lists of available
categories/predicates/properties for convenient auto-completion in RStudio.
# S3 method for class 'file_engine'
summary(object, ..., quiet = FALSE)A classed list of data frames and named lists.
When applied to a file_engine, also included are node-specific and
edge-specific properties.
# Using example KGX file packaged with monarchr
data(eds_marfan_kg)
# prints a readable summary and returns a list of dataframes
res <- eds_marfan_kg |> summary()
print(res)
#>
#> A KGX file-backed knowledge graph engine.
#> Total nodes: 3000
#> Total edges: 7148
#>
#> Node category counts:
#> category count
#> 1 biolink:Entity 3000
#> 2 biolink:NamedThing 3000
#> 3 biolink:BiologicalEntity 2977
#> 4 biolink:ThingWithTaxon 2977
#> 5 biolink:DiseaseOrPhenotypicFeature 2846
#> 6 biolink:PhenotypicFeature 2736
#> 7 biolink:PhysicalEssence 145
#> 8 biolink:PhysicalEssenceOrOccurrent 145
#> 9 biolink:GenomicEntity 131
#> 10 biolink:OntologyClass 131
#> 11 biolink:Disease 110
#> 12 biolink:SequenceVariant 81
#> 13 biolink:ChemicalEntityOrGeneOrGeneProduct 37
#> 14 biolink:Genotype 27
#> 15 biolink:Gene 23
#> 16 biolink:GeneOrGeneProduct 23
#> 17 biolink:MacromolecularMachineMixin 23
#> 18 biolink:ChemicalEntity 14
#> 19 biolink:ChemicalEntityOrProteinOrPolypeptide 14
#> 20 biolink:ChemicalOrDrugOrTreatment 14
#> 21 biolink:MolecularEntity 13
#>
#> Edge type counts:
#> predicate count
#> 1 biolink:subclass_of 5244
#> 2 biolink:has_phenotype 1709
#> 3 biolink:causes 56
#> 4 biolink:associated_with_increased_likelihood_of 38
#> 5 biolink:gene_associated_with_condition 28
#> 6 biolink:model_of 27
#> 7 biolink:has_mode_of_inheritance 26
#> 8 biolink:genetically_associated_with 11
#> 9 biolink:related_to 8
#> 10 biolink:treats_or_applied_or_studied_to_treat 1
#>
#> Node property counts:
#> property count
#> 16 pcategory 3000
#> 11 provided_by 3000
#> 7 category 3000
#> 1 id 3000
#> 2 name 2999
#> 10 namespace 2992
#> 8 iri 2868
#> 5 description 2668
#> 6 synonym 2320
#> 9 xref 1469
#> 12 in_taxon 120
#> 4 in_taxon_label 120
#> 15 has_gene 69
#> 14 type 33
#> 13 full_name 23
#> 3 symbol 23
#>
#> Edge property counts:
#> property count
#> 13 category 7148
#> 12 id 7148
#> 11 provided_by 7148
#> 10 primary_knowledge_source 7148
#> 9 aggregator_knowledge_source 7148
#> 8 knowledge_source 7148
#> 7 knowledge_level 7148
#> 6 agent_type 7148
#> 5 object 7148
#> 4 predicate 7148
#> 3 subject 7148
#> 2 to 7148
#> 1 from 7148
#> 15 original_subject 1782
#> 17 has_evidence 1729
#> 16 frequency_qualifier 1080
#> 23 publications 537
#> 18 has_total 463
#> 21 has_percentage 453
#> 20 has_count 453
#> 19 has_quotient 453
#> 25 original_predicate 81
#> 14 original_object 80
#> 24 qualifiers 62
#> 22 onset_qualifier 21
#>
#> For more information about Biolink node (Class) and edge
#> (Association) properties, see https://biolink.github.io/biolink-model/.