gpsea.analysis.clf package

class gpsea.analysis.clf.Classifier[source]

Bases: Generic[C], Partitioning

Classifier partitions a Patient into one of several discrete classes represented by a Categorization.

The classes must be exclusive - the individual can be binned into one and only one class, and exhaustive - the classes must cover all possible scenarios.

However, if the individual cannot be assigned into any meaningful class, None can be returned. As a rule of thumb, returning None will exclude the individual from the analysis.

abstract get_categorizations() → Sequence[C][source]: Get a sequence of all categories which the classifier can produce.

get_categories() → Iterator[PatientCategory][source]: Get an iterator with PatientCategory instances that the classifier can produce.

property class_labels: Collection[str]: Get a collection with names of the PatientCategory items that the classifier can produce.

summarize_classes() → str[source]

summarize(out: TextIO)[source]

Summarize the predicate into the out handle.

The summary includes the name, summary, and the groups the predicate can assign individuals into.

n_categorizations() → int[source]: Get the number of categorizations the classifier can produce.

get_category(cat_id: int) → PatientCategory[source]

Get the category name for a PatientCategory.cat_id.

Parameters:: cat_id – an int with the id.
Raises:: ValueError if there is no such category was defined.

get_category_name(cat_id: int) → str[source]

Get the category name for a PatientCategory.cat_id.

Parameters:: cat_id – an int with the id.
Raises:: ValueError if there is no such category was defined.

abstract test(individual: Patient) → C | None[source]

Assign an individual into a class.

Return None if the individual cannot be assigned into any meaningful class.

class gpsea.analysis.clf.PatientCategory(cat_id: int, name: str, description: str | None = None)[source]

Bases: object

PatientCategory represents one of several exclusive discrete classes.

Patient class has cat_id, a unique numeric identifier of the class, name with human-readable class name, and description with an optional verbose description.

property cat_id: int: Get an int with the unique numeric identifier of the class.

property name: str: Get a str with a human-readable name of the class.

property description: str | None: Get a str with an optional detailed class description.

class gpsea.analysis.clf.Categorization(category: PatientCategory)[source]

Bases: object

Categorization represents one of discrete classes a Patient can be assigned into.

static from_raw_parts(cat_id: int, name: str, description: str | None = None)[source]: Create Categorization from the cat_id identifier, name, and an optional description.

property category: PatientCategory

class gpsea.analysis.clf.GenotypeClassifier[source]

Bases: Classifier[Categorization]

GenotypeClassifier is a base class for all types that assign an individual into a group based on the genotype.

class gpsea.analysis.clf.AlleleCounter(predicate: VariantPredicate)[source]

Bases: object

AlleleCounter counts the number of alleles of all variants that pass the selection with a given predicate.

Parameters:: predicate – a VariantPredicate for selecting the target variants.

get_question() → str[source]

Get the question tested by the predicate.

Returns:: the question tested by the predicate
Return type:: str

count(patient: Patient) → int[source]

Count the number of alleles of all variants that pass the predicate. :param patient: the patient to test

Returns:: the count of the passing alleles
Return type:: int

gpsea.analysis.clf.sex_classifier() → GenotypeClassifier[source]

Get a genotype predicate for categorizing patients by their Sex.

See the Group by sex section for an example.

gpsea.analysis.clf.diagnosis_classifier(diagnoses: Iterable[TermId | str], labels: Iterable[str] | None = None) → GenotypeClassifier[source]

Genotype classifier bins an individual based on presence of a disease diagnosis, as listed in diseases attribute.

If the individual is diagnosed with more than one disease from the provided diagnoses, the individual is assigned into no group (None).

See the Group by diagnosis section for an example.

Parameters:

diagnoses – an iterable with at least 2 disease IDs, either as a str or a TermId to determine the genotype group.
labels – an iterable with diagnose names or None if disease IDs should be used instead. The number of labels must match the number of predicates.

gpsea.analysis.clf.monoallelic_classifier(a_predicate: VariantPredicate, b_predicate: VariantPredicate | None = None, a_label: str = 'A', b_label: str = 'B') → GenotypeClassifier[source]

Monoallelic classifier bins patient into one of two groups, A and B, based on presence of exactly one allele of a variant that meets the predicate criteria.

See Monoallelic classifier for more information and an example usage.

Parameters:

a_predicate – predicate to test if the variants meet the criteria of the first group (named A by default).
b_predicate – predicate to test if the variants meet the criteria of the second group or None if the inverse of a_predicate should be used (named B by default).
a_label – display name of the a_predicate (default "A").
b_label – display name of the b_predicate (default "B").

gpsea.analysis.clf.biallelic_classifier(a_predicate: VariantPredicate, b_predicate: VariantPredicate | None = None, a_label: str = 'A', b_label: str = 'B', partitions: Collection[int | Collection[int]] = (0, 1, 2)) → GenotypeClassifier[source]

Biallelic classifier assigns an individual into one of the three classes, AA, AB, and BB, based on presence of two variant alleles that meet the criteria.

See Biallelic classifier for more information and an example usage.

Parameters:

a_predicate – predicate to test if the variants meet the criteria of the first group (named A by default).
b_predicate – predicate to test if the variants meet the criteria of the second group or None if an inverse of a_predicate should be used (named B by default).
a_label – display name of the a_predicate (default "A").
b_label – display name of the b_predicate (default "B").
partitions – a sequence with partition identifiers (default (0, 1, 2)).

gpsea.analysis.clf.allele_count(counts: Collection[int | Collection[int]], target: VariantPredicate | None = None) → GenotypeClassifier[source]

Allele count classifier assigns the individual into a group based on the allele count of the target variants.

The counts option takes an int collection or a collection of int collections. An int value represents a target allele count and several counts can be grouped in a partition. A standalone int is assumed to represent a partition. The outer collection includes all partitions. An allele count can be included only in one partition.

Examples

The following counts will partition the cohort into individuals with zero allele or one target allele:

>>> from gpsea.analysis.clf import allele_count
>>> zero_vs_one = allele_count(counts=(0, 1))
>>> zero_vs_one.summarize_classes()
'Allele count: 0, 1'

These counts will create three classes for individuals with zero, one or two alleles:

>>> zero_vs_one_vs_two = allele_count(counts=(0, 1, 2))
>>> zero_vs_one_vs_two.summarize_classes()
'Allele count: 0, 1, 2'

Last, the counts below will create two groups, one for the individuals with zero target variant type alleles, and one for the individuals with one or two alleles:

>>> zero_vs_one_vs_two = allele_count(counts=(0, {1, 2}))
>>> zero_vs_one_vs_two.summarize_classes()
'Allele count: 0, 1 OR 2'

Note that we wrap the last two allele counts in a set.

Parameters:

counts – a sequence with allele count partitions.
target – a predicate for choosing the variants for testing or None if all variants in the individual should be used.

class gpsea.analysis.clf.PhenotypeClassifier[source]

Bases: Generic[P], Classifier[PhenotypeCategorization[P]]

Phenotype classifier assigns an individual into a class P based on the phenotype.

The class P can be a TermId representing an HPO term or an OMIM/MONDO term.

Only one class can be investigated, and phenotype returns the investigated phenotype (e.g. Arachnodactyly HP:0001166).

As another hallmark of this predicate, one of the categorizations must correspond to the group of patients who exibit the investigated phenotype. The categorization is provided via present_phenotype_categorization property.

abstract property phenotype: P: Get the phenotype entity of interest.

abstract property present_phenotype_categorization: PhenotypeCategorization[P]: Get the categorization which represents the group of the patients who exibit the investigated phenotype.

property present_phenotype_category: PatientCategory: Get the patient category that correspond to the group of the patients who exibit the investigated phenotype.

class gpsea.analysis.clf.PhenotypeCategorization(category: PatientCategory, phenotype: P)[source]

Bases: Generic[P], Categorization

On top of the attributes of the Categorization, PhenotypeCategorization keeps track of the target phenotype P.

property phenotype: P

class gpsea.analysis.clf.HpoClassifier(hpo: MinimalOntology, query: TermId, missing_implies_phenotype_excluded: bool = False)[source]

Bases: PhenotypeClassifier[TermId]

HpoClassifier tests if a patient is annotated with an HPO term.

Note, query must be a term of the provided hpo!

See HPO classifier section for an example usage.

Parameters:

hpo – HPO ontology
query – the HPO term to test
missing_implies_phenotype_excluded – True if lack of an explicit annotation implies term’s absence`.

property name: str: Get the name of the partitioning.

property description: str: Get a description of the partitioning.

property variable_name: str

Get a str with the name of the variable investigated by the partitioning.

For instance Sex, Allele groups, HP:0001250, OMIM:256000

property phenotype: TermId: Get the phenotype entity of interest.

property present_phenotype_categorization: PhenotypeCategorization[TermId]: Get the categorization which represents the group of the patients who exibit the investigated phenotype.

get_categorizations() → Sequence[PhenotypeCategorization[TermId]][source]: Get a sequence of all categories which the classifier can produce.

test(patient: Patient) → PhenotypeCategorization[TermId] | None[source]

Assign an individual into a class.

Return None if the individual cannot be assigned into any meaningful class.

class gpsea.analysis.clf.DiseasePresenceClassifier(disease_id_query: str | TermId)[source]

Bases: PhenotypeClassifier[TermId]

DiseasePresenceClassifier tests if an individual was diagnosed with a disease.

Parameters:: disease_id_query – a disease identifier formatted either as a CURIE str (e.g. OMIM:256000) or as a TermId.

property name: str: Get the name of the partitioning.

property description: str: Get a description of the partitioning.

property variable_name: str

Get a str with the name of the variable investigated by the partitioning.

For instance Sex, Allele groups, HP:0001250, OMIM:256000

property phenotype: TermId: Get the phenotype entity of interest.

property present_phenotype_categorization: PhenotypeCategorization[TermId]: Get the categorization which represents the group of the patients who exibit the investigated phenotype.

get_categorizations() → Sequence[PhenotypeCategorization[TermId]][source]: Get a sequence of all categories which the classifier can produce.

test(patient: Patient) → PhenotypeCategorization[TermId] | None[source]

Assign an individual into a class.

Return None if the individual cannot be assigned into any meaningful class.

gpsea.analysis.clf.prepare_classifiers_for_terms_of_interest(cohort: Iterable[Patient], hpo: MinimalOntology, missing_implies_excluded: bool = False) → Sequence[PhenotypeClassifier[TermId]][source]

A convenience method for creating a suite of phenotype classifiers for testing all phenotypes of interest.

Parameters:

cohort – a cohort of individuals to investigate.
hpo – an entity with an HPO graph (e.g. MinimalOntology).
missing_implies_excluded – True if absence of an annotation should be counted as its explicit exclusion.

gpsea.analysis.clf.prepare_hpo_terms_of_interest(cohort: Iterable[Patient], hpo: MinimalOntology) → Sequence[TermId][source]

Prepare a collection of HPO terms to test.

This includes the direct HPO patient annotations as well as the ancestors of the present terms and the descendants of the excluded terms.

Parameters:

cohort – a cohort of individuals to investigate.
hpo – HPO as MinimalOntology.