gpsea.analysis.pscore package

class gpsea.analysis.pscore.PhenotypeScorer[source]

Bases: ContinuousPartitioning

PhenotypeScorer assigns the patient with a phenotype score.

The score can be math.nan if it is not possible to compute the score for a patient.

The scorer can be created by wrapping a scoring function (see wrap_scoring_function()).

abstractmethod score(patient: Patient) → float[source]: Compute the score for the patient.

static wrap_scoring_function(func: Callable[[Patient], float], name: str = 'Custom Scoring Function') → PhenotypeScorer[source]

Create a PhenotypeScorer by wrap the provided scoring function func.

The function must take exactly one argument of type Patient and return a float with the corresponding phenotype score.

Example

>>> from gpsea.analysis.pscore import PhenotypeScorer
>>> def f(p): 123.4
>>> phenotype_scorer = PhenotypeScorer.wrap_scoring_function(f)

phenotype_scorer will assign all patients a score of 123.4.

Parameters:: func – the scoring function.

class gpsea.analysis.pscore.PhenotypeScoreAnalysis(score_statistic: PhenotypeScoreStatistic)[source]

Bases: object

PhenotypeScoreAnalysis tests the association between two or more genotype classes and a phenotype score.

A genotype class is assigned by a GenotypeClassifier and the phenotype score is computed with a PhenotypeScorer.

The association is tested with a PhenotypeScoreStatistic and the results are reported as a PhenotypeScoreAnalysisResult.

compare_genotype_vs_phenotype_score(cohort: Iterable[Patient], gt_clf: GenotypeClassifier, pheno_scorer: PhenotypeScorer) → PhenotypeScoreAnalysisResult[source]

Compute the association between genotype groups and phenotype score.

Parameters:

cohort – the cohort to analyze.
gt_clf – a classifier for assigning an individual into a genotype class.
pheno_scorer – the scorer to compute phenotype score.

class gpsea.analysis.pscore.PhenotypeScoreAnalysisResult(gt_clf: GenotypeClassifier, phenotype: PhenotypeScorer, statistic: Statistic, data: DataFrame, statistic_result: StatisticResult)[source]

Bases: MonoPhenotypeAnalysisResult

PhenotypeScoreAnalysisResult is a container for PhenotypeScoreAnalysis results.

The data property provides a data frame with phenotype score for each tested individual:

patient_id	genotype	phenotype
patient_1	0	1
patient_2	0	nan
patient_3	None	2
patient_4	1	2
…	…	…

The DataFrame index includes the identifiers of the tested individuals and the values are stored in genotype and phenotype columns.

The genotype includes the genotype category ID (cat_id) or None if the patient cannot be assigned into any genotype category.

The phenotype contains a float with the phenotype score. A NaN value is used if the phenotype score is impossible to compute.

phenotype_scorer() → PhenotypeScorer[source]: Get the scorer that computed the phenotype score.

plot_boxplots(ax, colors: Sequence[str] = ('#990F0F', '#A72929', '#B64343', '#C45D5D', '#D27676', '#E19090', '#EFAAAA'), median_color: str = '#00aaff', **boxplot_kwargs)[source]

Draw box plot with distributions of phenotype scores for the genotype groups.

Parameters:

ax – the Matplotlib Axes to draw on.
colors – a sequence with color palette for the box plot patches.
median_color – a str with the color for the boxplot median line.
boxplot_kwargs – arguments to pass into matplotlib.axes.Axes.boxplot() function.

plot_violins(ax, colors: Sequence[str] = ('#990F0F', '#A72929', '#B64343', '#C45D5D', '#D27676', '#E19090', '#EFAAAA'), **violinplot_kwargs)[source]

Draw a violin plot with distributions of phenotype scores for the genotype groups.

Parameters:

ax – the Matplotlib Axes to draw on.
colors – a sequence with color palette for the violin patches.
violinplot_kwargs – arguments to pass into matplotlib.axes.Axes.violinplot() function.

class gpsea.analysis.pscore.CountingPhenotypeScorer(hpo: MinimalOntology, query: Iterable[TermId])[source]

Bases: PhenotypeScorer

CountingPhenotypeScorer assigns the patient with a phenotype score that is equivalent to the count of observed phenotypes that are either an exact match to the query terms or their descendants.

For instance, we may want to count whether an individual has brain, liver, kidney, and skin abnormalities. In the case, the query would include the corresponding terms (e.g., Abnormal brain morphology HP:0012443). An individual can then have between 0 and 4 phenotype group abnormalities. This predicate is intended to be used with the Mann Whitney U test.

property description: str: Get a description of the partitioning.

static from_query_curies(hpo: MinimalOntology, query: Iterable[TermId | str])[source]

Create a scorer to test for the number of phenotype terms that fall into the phenotype groups.

Parameters:

hpo – HPO as represented by MinimalOntology of HPO toolkit.
query – an iterable of the top-level terms, either represented as CURIEs (str) or as term IDs.

property name: str: Get the name of the partitioning.

score(patient: Patient) → float[source]: Get the count (number) of terms in the query set that have matching terms (exact matches or descendants) in the affected individual. Do not double count if the individual has two terms (e.g., two different descendants) of one of the query terms.

property variable_name: str

Get a str with the name of the variable investigated by the partitioning.

For instance Sex, Allele groups, HP:0001250, OMIM:256000

class gpsea.analysis.pscore.DeVriesPhenotypeScorer(hpo: MinimalOntology)[source]

Bases: PhenotypeScorer

DeVriesPhenotypeScorer computes “adapted De Vries Score” as described in Feenstra et al..

See more in De Vries Score section.

property description: str: Get a description of the partitioning.

property name: str: Get the name of the partitioning.

score(patient: Patient) → float[source]

Calculate score based on list of strings with term identifiers or observed HPO terms.

Parameters:: patient – list of strings with term identifiers or observed HPO terms

Returns: de Vries score between 0 and 10

property variable_name: str

Get a str with the name of the variable investigated by the partitioning.

For instance Sex, Allele groups, HP:0001250, OMIM:256000

class gpsea.analysis.pscore.MeasurementPhenotypeScorer(term_id: str | TermId, label: str)[source]

Bases: PhenotypeScorer

MeasurementPhenotypeScorer uses a value of a measurement as a phenotype score.

For instance, the amount of Testosterone [Mass/volume] in Serum or Plasma.

Example

Create a scorer that uses the level of testosterone represented by the Testosterone [Mass/volume] in Serum or Plasma LOINC code as a phenotype score.

>>> from gpsea.analysis.pscore import MeasurementPhenotypeScorer
>>> pheno_scorer = MeasurementPhenotypeScorer.from_measurement_id(
...     term_id="LOINC:2986-8",
...     label="Testosterone [Mass/volume] in Serum or Plasma",
... )
>>> # use the scorer in the analysis ...

property description: str: Get a description of the partitioning.

static from_measurement_id(term_id: str | TermId, label: str) → MeasurementPhenotypeScorer[source]

Create MeasurementPhenotypeScorer from a measurement identifier.

Parameters:

term_id – a str with CURIE or a TermId representing the term ID of a measurement (e.g. LOINC:2986-8).
label – a str with the measurement label (e.g. Testosterone [Mass/volume] in Serum or Plasma)

property label: str

property name: str: Get the name of the partitioning.

score(patient: Patient) → float[source]: Compute the phenotype score.

property term_id: TermId

property variable_name: str

Get a str with the name of the variable investigated by the partitioning.

For instance Sex, Allele groups, HP:0001250, OMIM:256000

Subpackages

gpsea.analysis.pscore.stats package