gpsea.analysis.pscore package
- class gpsea.analysis.pscore.PhenotypeScorer[source]
Bases:
ContinuousPartitioning
PhenotypeScorer assigns the patient with a phenotype score.
The score can be math.nan if it is not possible to compute the score for a patient.
The scorer can be created by wrapping a scoring function (see
wrap_scoring_function()
).- static wrap_scoring_function(func: Callable[[Patient], float], name: str = 'Custom Scoring Function') PhenotypeScorer [source]
Create a PhenotypeScorer by wrap the provided scoring function func.
The function must take exactly one argument of type
Patient
and return a float with the corresponding phenotype score.Example
>>> from gpsea.analysis.pscore import PhenotypeScorer >>> def f(p): 123.4 >>> phenotype_scorer = PhenotypeScorer.wrap_scoring_function(f)
phenotype_scorer will assign all patients a score of 123.4.
- Parameters:
func – the scoring function.
- class gpsea.analysis.pscore.PhenotypeScoreAnalysis(score_statistic: PhenotypeScoreStatistic)[source]
Bases:
object
PhenotypeScoreAnalysis tests the association between two or more genotype classes and a phenotype score.
A genotype class is assigned by a
GenotypeClassifier
and the phenotype score is computed with aPhenotypeScorer
.The association is tested with a
PhenotypeScoreStatistic
and the results are reported as aPhenotypeScoreAnalysisResult
.- compare_genotype_vs_phenotype_score(cohort: Iterable[Patient], gt_clf: GenotypeClassifier, pheno_scorer: PhenotypeScorer) PhenotypeScoreAnalysisResult [source]
Compute the association between genotype groups and phenotype score.
- Parameters:
cohort – the cohort to analyze.
gt_clf – a classifier for assigning an individual into a genotype class.
pheno_scorer – the scorer to compute phenotype score.
- class gpsea.analysis.pscore.PhenotypeScoreAnalysisResult(gt_clf: GenotypeClassifier, phenotype: PhenotypeScorer, statistic: Statistic, data: DataFrame, statistic_result: StatisticResult)[source]
Bases:
MonoPhenotypeAnalysisResult
PhenotypeScoreAnalysisResult is a container for
PhenotypeScoreAnalysis
results.The
data
property provides a data frame with phenotype score for each tested individual:patient_id
genotype
phenotype
patient_1
0
1
patient_2
0
nan
patient_3
None
2
patient_4
1
2
…
…
…
The DataFrame index includes the identifiers of the tested individuals and the values are stored in genotype and phenotype columns.
The genotype includes the genotype category ID (
cat_id
) or None if the patient cannot be assigned into any genotype category.The phenotype contains a float with the phenotype score. A NaN value is used if the phenotype score is impossible to compute.
- phenotype_scorer() PhenotypeScorer [source]
Get the scorer that computed the phenotype score.
- plot_boxplots(ax, colors=('darksalmon', 'honeydew'))[source]
Draw box plot with distributions of phenotype scores for the genotype groups.
- Parameters:
gt_predicate – the genotype predicate used to produce the genotype groups.
ax – the Matplotlib
Axes
to draw on.colors – a tuple with colors to use for coloring the box patches of the box plot.
- class gpsea.analysis.pscore.CountingPhenotypeScorer(hpo: MinimalOntology, query: Iterable[TermId])[source]
Bases:
PhenotypeScorer
CountingPhenotypeScorer assigns the patient with a phenotype score that is equivalent to the count of observed phenotypes that are either an exact match to the query terms or their descendants.
For instance, we may want to count whether an individual has brain, liver, kidney, and skin abnormalities. In the case, the query would include the corresponding terms (e.g., Abnormal brain morphology HP:0012443). An individual can then have between 0 and 4 phenotype group abnormalities. This predicate is intended to be used with the Mann Whitney U test.
- static from_query_curies(hpo: MinimalOntology, query: Iterable[TermId | str])[source]
Create a scorer to test for the number of phenotype terms that fall into the phenotype groups.
- Parameters:
hpo – HPO as represented by
MinimalOntology
of HPO toolkit.query – an iterable of the top-level terms, either represented as CURIEs (str) or as term IDs.
- class gpsea.analysis.pscore.DeVriesPhenotypeScorer(hpo: MinimalOntology)[source]
Bases:
PhenotypeScorer
DeVriesPhenotypeScorer computes “adapted De Vries Score” as described in Feenstra et al..
See more in De Vries Score section.
- class gpsea.analysis.pscore.MeasurementPhenotypeScorer(term_id: str | TermId, label: str)[source]
Bases:
PhenotypeScorer
MeasurementPhenotypeScorer uses a value of a measurement as a phenotype score.
For instance, the amount of Testosterone [Mass/volume] in Serum or Plasma.
Example
Create a scorer that uses the level of testosterone represented by the Testosterone [Mass/volume] in Serum or Plasma LOINC code as a phenotype score.
>>> from gpsea.analysis.pscore import MeasurementPhenotypeScorer >>> pheno_scorer = MeasurementPhenotypeScorer.from_measurement_id( ... term_id="LOINC:2986-8", ... label="Testosterone [Mass/volume] in Serum or Plasma", ... ) >>> # use the scorer in the analysis ...
- static from_measurement_id(term_id: str | TermId, label: str) MeasurementPhenotypeScorer [source]
Create MeasurementPhenotypeScorer from a measurement identifier.
- Parameters:
term_id – a str with CURIE or a
TermId
representing the term ID of a measurement (e.g. LOINC:2986-8).label – a str with the measurement label (e.g. Testosterone [Mass/volume] in Serum or Plasma)