gpsea.analysis package
- class gpsea.analysis.AnalysisResult(gt_predicate: GenotypePolyPredicate, statistic: Statistic)[source]
Bases:
object
AnalysisResult includes the common parts of results of all analyses.
- property gt_predicate: GenotypePolyPredicate
Get the genotype predicate used in the survival analysis that produced this result.
- property statistic: Statistic
Get the statistic which computed the (nominal) p values for this result.
- class gpsea.analysis.MonoPhenotypeAnalysisResult(gt_predicate: GenotypePolyPredicate, phenotype: Partitioning, statistic: Statistic, data: DataFrame, pval: float)[source]
Bases:
AnalysisResult
MonoPhenotypeAnalysisResult reports the outcome of an analysis that tested a single genotype-phenotype association.
- GT_COL = 'genotype'
Name of column for storing genotype data.
- PH_COL = 'phenotype'
Name of column for storing phenotype data.
- DATA_COLUMNS = ('genotype', 'phenotype')
The required columns of the data data frame.
- property phenotype: Partitioning
Get the
Partitioning
that produced the phenotype.
- property data: DataFrame
Get the data frame with genotype and phenotype values for each tested individual.
The index of the data frame contains the identifiers of the tested individuals, and the values are stored in genotype and phenotype columns.
The genotype column includes the genotype category ID (
cat_id
) or None if the individual could not be assigned into a genotype group. The phenotype contains the phenotype values, and the data type depends on the analysis.Here are some common phenotype data types:
a phenotype score computed in
PhenotypeScoreAnalysis
is a floatsurvival computed in
SurvivalAnalysis
is of typeSurvival
- class gpsea.analysis.MultiPhenotypeAnalysisResult(gt_predicate: GenotypePolyPredicate, pheno_predicates: Iterable[PhenotypePolyPredicate[P]], statistic: Statistic, n_usable: Sequence[int], all_counts: Sequence[DataFrame], pvals: Sequence[float], corrected_pvals: Sequence[float] | None, mtc_correction: str | None)[source]
Bases:
Generic
[P
],AnalysisResult
MultiPhenotypeAnalysisResult reports the outcome of an analysis that tested the association of genotype with two or more phenotypes.
- property pheno_predicates: Sequence[PhenotypePolyPredicate[P]]
Get the phenotype predicates used in the analysis.
- property phenotypes: Sequence[P]
Get the phenotypes that were tested for association with genotype in the analysis.
- property n_usable: Sequence[int]
Get a sequence of numbers of patients where the phenotype was assessable, and are, thus, usable for genotype-phenotype correlation analysis.
- property all_counts: Sequence[DataFrame]
Get a
DataFrame
sequence where each DataFrame includes the counts of patients in genotype and phenotype groups.An example for a genotype predicate that bins into two categories (Yes and No) based on presence of a missense variant in transcript NM_123456.7, and phenotype predicate that checks presence/absence of HP:0001166 (a phenotype term):
Has MISSENSE_VARIANT in NM_123456.7 No Yes Present Yes 1 13 No 7 5
The rows correspond to the phenotype categories, and the columns represent the genotype categories.
- property pvals: Sequence[float]
Get a sequence of nominal p values for each tested phenotype. The sequence includes a NaN value for each phenotype that was not tested.
- property corrected_pvals: Sequence[float] | None
Get a sequence with p values for each tested phenotype after multiple testing correction or None if the correction was not applied. The sequence includes a NaN value for each phenotype that was not tested.
- n_significant_for_alpha(alpha: float = 0.05) int | None [source]
Get the count of the corrected p values with the value being less than or equal to alpha.
- Parameters:
alpha – a float with significance level.
- significant_phenotype_indices(alpha: float = 0.05, pval_kind: Literal['corrected', 'nominal'] = 'corrected') Sequence[int] | None [source]
Get the indices of phenotypes that attain significance for provided alpha.