gpsea.analysis.pscore package

class gpsea.analysis.pscore.PhenotypeScorer[source]

Bases: object

PhenotypeScorer assigns the patient with a phenotype score.

The score can be math.nan if it is not possible to compute the score for a patient.

The scorer can be created by wrapping a scoring function (see wrap_scoring_function()).

static wrap_scoring_function(func: Callable[[Patient], float]) PhenotypeScorer[source]

Create a PhenotypeScorer by wrap the provided scoring function func.

The function must take exactly one argument of type Patient and return a float with the corresponding phenotype score.

Example

>>> from gpsea.analysis.pscore import PhenotypeScorer
>>> def f(p): 123.4
>>> phenotype_scorer = PhenotypeScorer.wrap_scoring_function(f)

phenotype_scorer will assign all patients a score of 123.4.

Parameters:

func – the scoring function.

score(patient: Patient) float[source]

Compute the score for the patient.

class gpsea.analysis.pscore.PhenotypeScoreAnalysis(score_statistic: PhenotypeScoreStatistic)[source]

Bases: object

PhenotypeScoreAnalysis tests the association between two or more genotype groups and a phenotype score.

The genotype groups are created by a GenotypePolyPredicate and the phenotype score is computed with PhenotypeScorer.

The association is tested with a PhenotypeScoreStatistic and the results are reported as a PhenotypeScoreAnalysisResult.

compare_genotype_vs_phenotype_score(cohort: Iterable[Patient], gt_predicate: GenotypePolyPredicate, pheno_scorer: PhenotypeScorer) PhenotypeScoreAnalysisResult[source]

Compute the association between genotype groups and phenotype score.

Parameters:
  • cohort – the cohort to analyze.

  • gt_predicate – a predicate for assigning an individual into a genotype group.

  • pheno_scorer – the scorer to compute phenotype score.

class gpsea.analysis.pscore.PhenotypeScoreAnalysisResult(genotype_phenotype_scores: DataFrame, pval: float)[source]

Bases: object

PhenotypeScoreAnalysisResult is a container for PhenotypeScoreAnalysis results.

property genotype_phenotype_scores: DataFrame

Get the DataFrame with the genotype group and the phenotype score for each patient.

The DataFrame has the following structure:

patient_id

genotype

phenotype

patient_1

0

1

patient_2

0

3

patient_3

None

2

patient_4

1

2

The DataFrame index includes the patient IDs, and then there are 2 columns with the genotype group id (cat_id) and the phenotype score. A genotype value may be missing if the patient cannot be assigned into any genotype category.

property pval: float

Get the p value of the test.

plot_boxplots(gt_predicate: GenotypePolyPredicate, ax, colors=['darksalmon', 'honeydew'])[source]

Draw box plots with distributions of phenotype scores for genotype groups

class gpsea.analysis.pscore.CountingPhenotypeScorer(hpo: MinimalOntology, query: Iterable[TermId])[source]

Bases: PhenotypeScorer

CountingPhenotypeScorer assigns the patient with a phenotype score that is equivalent to the count of present phenotypes that are either an exact match to the query terms or their descendants.

For instance, we may want to count whether an individual has brain, liver, kidney, and skin abnormalities. In the case, the query would include the corresponding terms (e.g., Abnormal brain morphology HP:0012443). An individual can then have between 0 and 4 phenotype group abnormalities. This predicate is intended to be used with the Mann Whitney U test.

static from_query_curies(hpo: MinimalOntology, query: Iterable[TermId | str])[source]

Create a scorer to test for the number of phenotype terms that fall into the phenotype groups.

Parameters:
  • hpo – HPO as represented by MinimalOntology of HPO toolkit.

  • query – an iterable of the top-level terms, either represented as CURIEs (str) or as term IDs.

get_question() str[source]
score(patient: Patient) float[source]

Get the count (number) of terms in the query set that have matching terms (exact matches or descendants) in the patient. Do not double count if the patient has two terms (e.g., two different descendants) of one of the query terms.

class gpsea.analysis.pscore.DeVriesPhenotypeScorer(hpo: MinimalOntology)[source]

Bases: PhenotypeScorer

DeVriesPhenotypeScorer computes “adapted De Vries Score” as described in Feenstra et al..

score(patient: Patient) float[source]

Calculate score based on list of strings with term identifiers or observed HPO terms.

Parameters:

patient – list of strings with term identifiers or observed HPO terms

Returns: de Vries score between 0 and 10

Subpackages