gpsea.analysis.pscore package
- class gpsea.analysis.pscore.PhenotypeScorer[source]
Bases:
object
PhenotypeScorer assigns the patient with a phenotype score.
The score can be
math.nan
if it is not possible to compute the score for a patient.The scorer can be created by wrapping a scoring function (see
wrap_scoring_function()
).- static wrap_scoring_function(func: Callable[[Patient], float]) PhenotypeScorer [source]
Create a PhenotypeScorer by wrap the provided scoring function func.
The function must take exactly one argument of type
Patient
and return a float with the corresponding phenotype score.Example
>>> from gpsea.analysis.pscore import PhenotypeScorer >>> def f(p): 123.4 >>> phenotype_scorer = PhenotypeScorer.wrap_scoring_function(f)
phenotype_scorer will assign all patients a score of 123.4.
- Parameters:
func – the scoring function.
- class gpsea.analysis.pscore.PhenotypeScoreAnalysis(score_statistic: PhenotypeScoreStatistic)[source]
Bases:
object
PhenotypeScoreAnalysis tests the association between two or more genotype groups and a phenotype score.
The genotype groups are created by a
GenotypePolyPredicate
and the phenotype score is computed withPhenotypeScorer
.The association is tested with a
PhenotypeScoreStatistic
and the results are reported as aPhenotypeScoreAnalysisResult
.- compare_genotype_vs_phenotype_score(cohort: Iterable[Patient], gt_predicate: GenotypePolyPredicate, pheno_scorer: PhenotypeScorer) PhenotypeScoreAnalysisResult [source]
Compute the association between genotype groups and phenotype score.
- Parameters:
cohort – the cohort to analyze.
gt_predicate – a predicate for assigning an individual into a genotype group.
pheno_scorer – the scorer to compute phenotype score.
- class gpsea.analysis.pscore.PhenotypeScoreAnalysisResult(genotype_phenotype_scores: DataFrame, pval: float)[source]
Bases:
object
PhenotypeScoreAnalysisResult is a container for
PhenotypeScoreAnalysis
results.- property genotype_phenotype_scores: DataFrame
Get the DataFrame with the genotype group and the phenotype score for each patient.
The DataFrame has the following structure:
patient_id
genotype
phenotype
patient_1
0
1
patient_2
0
3
patient_3
None
2
patient_4
1
2
…
…
…
The DataFrame index includes the patient IDs, and then there are 2 columns with the genotype group id (
cat_id
) and the phenotype score. A genotype value may be missing if the patient cannot be assigned into any genotype category.
- plot_boxplots(gt_predicate: GenotypePolyPredicate, ax, colors=['darksalmon', 'honeydew'])[source]
Draw box plots with distributions of phenotype scores for genotype groups
- class gpsea.analysis.pscore.CountingPhenotypeScorer(hpo: MinimalOntology, query: Iterable[TermId])[source]
Bases:
PhenotypeScorer
CountingPhenotypeScorer assigns the patient with a phenotype score that is equivalent to the count of present phenotypes that are either an exact match to the query terms or their descendants.
For instance, we may want to count whether an individual has brain, liver, kidney, and skin abnormalities. In the case, the query would include the corresponding terms (e.g., Abnormal brain morphology HP:0012443). An individual can then have between 0 and 4 phenotype group abnormalities. This predicate is intended to be used with the Mann Whitney U test.
- static from_query_curies(hpo: MinimalOntology, query: Iterable[TermId | str])[source]
Create a scorer to test for the number of phenotype terms that fall into the phenotype groups.
- Parameters:
hpo – HPO as represented by
MinimalOntology
of HPO toolkit.query – an iterable of the top-level terms, either represented as CURIEs (str) or as term IDs.
- class gpsea.analysis.pscore.DeVriesPhenotypeScorer(hpo: MinimalOntology)[source]
Bases:
PhenotypeScorer
DeVriesPhenotypeScorer computes “adapted De Vries Score” as described in Feenstra et al..