HPO classifier
When testing for presence or absence of an HPO term, the HpoClassifier
leverages the True path rule to take advantage of the HPO hierarchy.
In result, an individual annotated with a term is implicitly annotated with all its ancestors.
For instance, an individual annotated with Ectopia lentis
is also annotated with Abnormal lens morphology,
Abnormal anterior eye segment morphology,
Abnormal eye morphology, …
Similarly, all descendants of a term, whose presence was specifically excluded in an individual, are implicitly excluded.
Example
Here we show how to set up HpoClassifier
to test for a presence of Abnormal lens morphology.
We need to load MinimalOntology
with HPO data to access the HPO hierarchy:
>>> import hpotk
>>> store = hpotk.configure_ontology_store()
>>> hpo = store.load_minimal_hpo(release='v2024-07-01')
and now we can set up the classifier to test for presence of Abnormal lens morphology:
>>> from gpsea.analysis.clf import HpoClassifier
>>> query = hpotk.TermId.from_curie('HP:0000517')
>>> pheno_clf = HpoClassifier(
... hpo=hpo,
... query=query,
... )
>>> pheno_clf.name
'HPO Classifier'
>>> pheno_clf.description
'Test for presence of Abnormal lens morphology [HP:0000517]'
>>> pheno_clf.class_labels
('Yes', 'No')
missing_implies_phenotype_excluded
In many cases, published reports of clinical data about individuals with rare diseases describe phenotypic features that were observed, but do not provide a comprehensive list of features that were explicitly excluded. By default, GPSEA will only include features that are recorded as observed or excluded in a phenopacket.
However, setting missing_implies_excluded=True
will cause “n/a” entries to be set to “excluded”.
We provide this option for exploration but do not recommend its use
for the final analysis unless the assumption behind it is known to be true.
Classifiers for all cohort phenotypes
Constructing phenotype classifiers for all HPO terms of a cohort sounds a bit tedious.
The prepare_classifiers_for_terms_of_interest()
function cuts down the tedium.
Example
For a phenopacket collection (e.g. 156 patients with mutations in TBX5 gene included in Phenopacket Store version 0.1.18)
>>> from ppktstore.registry import configure_phenopacket_registry
>>> registry = configure_phenopacket_registry()
>>> with registry.open_phenopacket_store(release='0.1.18') as ps:
... phenopackets = tuple(ps.iter_cohort_phenopackets('TBX5'))
>>> len(phenopackets)
156
processed into a cohort
>>> from gpsea.preprocessing import configure_caching_cohort_creator, load_phenopackets
>>> cohort_creator = configure_caching_cohort_creator(hpo)
>>> cohort, _ = load_phenopackets(phenopackets, cohort_creator)
Individuals Processed: ...
we can create HPO classifiers for testing all 369 HPO terms used in the cohort:
>>> from gpsea.analysis.clf import prepare_classifiers_for_terms_of_interest
>>> pheno_clfs = prepare_classifiers_for_terms_of_interest(
... cohort=cohort,
... hpo=hpo,
... )
>>> len(pheno_clfs)
369
and subject the predicates into further analysis, such as HpoTermAnalysis
.