gpsea.analysis.clf package
- class gpsea.analysis.clf.Classifier[source]
Bases:
Generic
[C
],Partitioning
Classifier partitions a
Patient
into one of several discrete classes represented by aCategorization
.The classes must be exclusive - the individual can be binned into one and only one class, and exhaustive - the classes must cover all possible scenarios.
However, if the individual cannot be assigned into any meaningful class, None can be returned. As a rule of thumb, returning None will exclude the individual from the analysis.
- abstract get_categorizations() Sequence[C] [source]
Get a sequence of all categories which the classifier can produce.
- get_categories() Iterator[PatientCategory] [source]
Get an iterator with
PatientCategory
instances that the classifier can produce.
- property class_labels: Collection[str]
Get a collection with names of the
PatientCategory
items that the classifier can produce.
- summarize(out: TextIO)[source]
Summarize the predicate into the out handle.
The summary includes the name, summary, and the groups the predicate can assign individuals into.
- get_category(cat_id: int) PatientCategory [source]
Get the category name for a
PatientCategory.cat_id
.- Parameters:
cat_id – an int with the id.
- Raises:
ValueError if there is no such category was defined.
- get_category_name(cat_id: int) str [source]
Get the category name for a
PatientCategory.cat_id
.- Parameters:
cat_id – an int with the id.
- Raises:
ValueError if there is no such category was defined.
- class gpsea.analysis.clf.PatientCategory(cat_id: int, name: str, description: str | None = None)[source]
Bases:
object
PatientCategory represents one of several exclusive discrete classes.
Patient class has
cat_id
, a unique numeric identifier of the class,name
with human-readable class name, anddescription
with an optional verbose description.
- class gpsea.analysis.clf.Categorization(category: PatientCategory)[source]
Bases:
object
Categorization represents one of discrete classes a
Patient
can be assigned into.- static from_raw_parts(cat_id: int, name: str, description: str | None = None)[source]
Create Categorization from the cat_id identifier, name, and an optional description.
- property category: PatientCategory
- class gpsea.analysis.clf.GenotypeClassifier[source]
Bases:
Classifier
[Categorization
]GenotypeClassifier is a base class for all types that assign an individual into a group based on the genotype.
- class gpsea.analysis.clf.AlleleCounter(predicate: VariantPredicate)[source]
Bases:
object
AlleleCounter counts the number of alleles of all variants that pass the selection with a given predicate.
- Parameters:
predicate – a
VariantPredicate
for selecting the target variants.
- gpsea.analysis.clf.sex_classifier() GenotypeClassifier [source]
Get a genotype predicate for categorizing patients by their
Sex
.See the Group by sex section for an example.
- gpsea.analysis.clf.diagnosis_classifier(diagnoses: Iterable[TermId | str], labels: Iterable[str] | None = None) GenotypeClassifier [source]
Genotype classifier bins an individual based on presence of a disease diagnosis, as listed in
diseases
attribute.If the individual is diagnosed with more than one disease from the provided diagnoses, the individual is assigned into no group (None).
See the Group by diagnosis section for an example.
- Parameters:
diagnoses – an iterable with at least 2 disease IDs, either as a str or a
TermId
to determine the genotype group.labels – an iterable with diagnose names or None if disease IDs should be used instead. The number of labels must match the number of predicates.
- gpsea.analysis.clf.monoallelic_classifier(a_predicate: VariantPredicate, b_predicate: VariantPredicate | None = None, a_label: str = 'A', b_label: str = 'B') GenotypeClassifier [source]
Monoallelic classifier bins patient into one of two groups, A and B, based on presence of exactly one allele of a variant that meets the predicate criteria.
See Monoallelic classifier for more information and an example usage.
- Parameters:
a_predicate – predicate to test if the variants meet the criteria of the first group (named A by default).
b_predicate – predicate to test if the variants meet the criteria of the second group or None if the inverse of a_predicate should be used (named B by default).
a_label – display name of the a_predicate (default
"A"
).b_label – display name of the b_predicate (default
"B"
).
- gpsea.analysis.clf.biallelic_classifier(a_predicate: VariantPredicate, b_predicate: VariantPredicate | None = None, a_label: str = 'A', b_label: str = 'B', partitions: Collection[int | Collection[int]] = (0, 1, 2)) GenotypeClassifier [source]
Biallelic classifier assigns an individual into one of the three classes, AA, AB, and BB, based on presence of two variant alleles that meet the criteria.
See Biallelic classifier for more information and an example usage.
- Parameters:
a_predicate – predicate to test if the variants meet the criteria of the first group (named A by default).
b_predicate – predicate to test if the variants meet the criteria of the second group or None if an inverse of a_predicate should be used (named B by default).
a_label – display name of the a_predicate (default
"A"
).b_label – display name of the b_predicate (default
"B"
).partitions – a sequence with partition identifiers (default
(0, 1, 2)
).
- gpsea.analysis.clf.allele_count(counts: Collection[int | Collection[int]], target: VariantPredicate | None = None) GenotypeClassifier [source]
Allele count classifier assigns the individual into a group based on the allele count of the target variants.
The counts option takes an int collection or a collection of int collections. An int value represents a target allele count and several counts can be grouped in a partition. A standalone int is assumed to represent a partition. The outer collection includes all partitions. An allele count can be included only in one partition.
Examples
The following counts will partition the cohort into individuals with zero allele or one target allele:
>>> from gpsea.analysis.clf import allele_count >>> zero_vs_one = allele_count(counts=(0, 1)) >>> zero_vs_one.summarize_classes() 'Allele count: 0, 1'
These counts will create three classes for individuals with zero, one or two alleles:
>>> zero_vs_one_vs_two = allele_count(counts=(0, 1, 2)) >>> zero_vs_one_vs_two.summarize_classes() 'Allele count: 0, 1, 2'
Last, the counts below will create two groups, one for the individuals with zero target variant type alleles, and one for the individuals with one or two alleles:
>>> zero_vs_one_vs_two = allele_count(counts=(0, {1, 2})) >>> zero_vs_one_vs_two.summarize_classes() 'Allele count: 0, 1 OR 2'
Note that we wrap the last two allele counts in a set.
- Parameters:
counts – a sequence with allele count partitions.
target – a predicate for choosing the variants for testing or None if all variants in the individual should be used.
- class gpsea.analysis.clf.PhenotypeClassifier[source]
Bases:
Generic
[P
],Classifier
[PhenotypeCategorization
[P
]]Phenotype classifier assigns an individual into a class P based on the phenotype.
The class P can be a
TermId
representing an HPO term or an OMIM/MONDO term.Only one class can be investigated, and
phenotype
returns the investigated phenotype (e.g. Arachnodactyly HP:0001166).As another hallmark of this predicate, one of the categorizations must correspond to the group of patients who exibit the investigated phenotype. The categorization is provided via
present_phenotype_categorization
property.- abstract property phenotype: P
Get the phenotype entity of interest.
- abstract property present_phenotype_categorization: PhenotypeCategorization[P]
Get the categorization which represents the group of the patients who exibit the investigated phenotype.
- property present_phenotype_category: PatientCategory
Get the patient category that correspond to the group of the patients who exibit the investigated phenotype.
- class gpsea.analysis.clf.PhenotypeCategorization(category: PatientCategory, phenotype: P)[source]
Bases:
Generic
[P
],Categorization
On top of the attributes of the Categorization, PhenotypeCategorization keeps track of the target phenotype P.
- property phenotype: P
- class gpsea.analysis.clf.HpoClassifier(hpo: MinimalOntology, query: TermId, missing_implies_phenotype_excluded: bool = False)[source]
Bases:
PhenotypeClassifier
[TermId
]HpoClassifier tests if a patient is annotated with an HPO term.
Note, query must be a term of the provided hpo!
See HPO classifier section for an example usage.
- Parameters:
hpo – HPO ontology
query – the HPO term to test
missing_implies_phenotype_excluded – True if lack of an explicit annotation implies term’s absence`.
- property variable_name: str
Get a str with the name of the variable investigated by the partitioning.
For instance Sex, Allele groups, HP:0001250, OMIM:256000
- property present_phenotype_categorization: PhenotypeCategorization[TermId]
Get the categorization which represents the group of the patients who exibit the investigated phenotype.
- get_categorizations() Sequence[PhenotypeCategorization[TermId]] [source]
Get a sequence of all categories which the classifier can produce.
- class gpsea.analysis.clf.DiseasePresenceClassifier(disease_id_query: str | TermId)[source]
Bases:
PhenotypeClassifier
[TermId
]DiseasePresenceClassifier tests if an individual was diagnosed with a disease.
- Parameters:
disease_id_query – a disease identifier formatted either as a CURIE str (e.g.
OMIM:256000
) or as aTermId
.
- property variable_name: str
Get a str with the name of the variable investigated by the partitioning.
For instance Sex, Allele groups, HP:0001250, OMIM:256000
- property present_phenotype_categorization: PhenotypeCategorization[TermId]
Get the categorization which represents the group of the patients who exibit the investigated phenotype.
- get_categorizations() Sequence[PhenotypeCategorization[TermId]] [source]
Get a sequence of all categories which the classifier can produce.
- gpsea.analysis.clf.prepare_classifiers_for_terms_of_interest(cohort: Iterable[Patient], hpo: MinimalOntology, missing_implies_excluded: bool = False) Sequence[PhenotypeClassifier[TermId]] [source]
A convenience method for creating a suite of phenotype classifiers for testing all phenotypes of interest.
- Parameters:
cohort – a cohort of individuals to investigate.
hpo – an entity with an HPO graph (e.g.
MinimalOntology
).missing_implies_excluded – True if absence of an annotation should be counted as its explicit exclusion.
- gpsea.analysis.clf.prepare_hpo_terms_of_interest(cohort: Iterable[Patient], hpo: MinimalOntology) Sequence[TermId] [source]
Prepare a collection of HPO terms to test.
This includes the direct HPO patient annotations as well as the ancestors of the present terms and the descendants of the excluded terms.
- Parameters:
cohort – a cohort of individuals to investigate.
hpo – HPO as
MinimalOntology
.