Counting phenotype scorer
The CountingPhenotypeScorer
assigns a phenotype score that is equivalent
to the count of observed phenotypes (HPO terms) that are either an exact match to the query terms or their descendants.
Typically, the query terms will comprise abnormalities in different organ systems.
For instance, we may want to count whether an individual has brain, liver, kidney, and skin abnormalities.
In the case, the query would include the corresponding terms (e.g., Abnormal brain morphology HP:0012443).
An individual can then have between 0 and 4 phenotype group abnormalities. The scorer does not double count if the individual has multiple
observed abnormalities in one of the organ systems (i.e., multiple descendents of one of the query terms). Each individual can thus have a score
of between 0 (no relevant abnormalities) to the number of categories (if the individual has an abnormality in each of the categories).
The genotype groups are then compared with respect to the distribution of counts using the Mann Whitney U test.
Example
Here we use CountingPhenotypeScorer
for scoring
the individuals based on the number of structural defects
from the following 5 categories:
Brain anomalies
Eye anomalies
Congenital heart defects
Renal anomalies
Sensorineural hearing loss
For example, an individual with a congenital heart defect would be assigned a score of 1, an individual with congenital heart defect and a renal anomaly would be assigned a score of 2, and so on. If an individual had two heart defects (e.g., atrial septal defect and ventricular septal defect), a score of 1 (not 2) would be assigned for the heart defect category.
The CountingPhenotypeScorer
automatizes this scoring method
by encoding the categories into HPO terms:
>>> structural_defects = (
... 'HP:0012443', # Abnormal brain morphology (Brain anomalies)
... 'HP:0012372', # Abnormal eye morphology (Eye anomalies)
... 'HP:0001627', # Abnormal heart morphology (Congenital heart defects)
... 'HP:0012210', # Abnormal renal morphology (Renal anomalies)
... 'HP:0000407', # Sensorineural hearing impairment (Sensorineural hearing loss)
... )
and then tests the individuals for presence of at least one HPO term that corresponds to the structural defect (e.g. Abnormal brain morphology, exact match) or that is its descendant (e.g. Cerebellar atrophy).
The counting scorer uses HPO hierarchy as a prerequisite. We can load HPO using HPO toolkit:
>>> import hpotk
>>> store = hpotk.configure_ontology_store()
>>> hpo = store.load_minimal_hpo(release='v2024-07-01')
Then, we construct the scorer with
from_query_curies()
function:
>>> from gpsea.analysis.pscore import CountingPhenotypeScorer
>>> pheno_scorer = CountingPhenotypeScorer.from_query_curies(
... hpo=hpo,
... query=structural_defects,
... )
>>> pheno_scorer.description
'Assign a phenotype score that is equivalent to the count of present phenotypes that are either an exact match to the query terms or their descendants'