gpsea.analysis.predicate.genotype package

class gpsea.analysis.predicate.genotype.GenotypePolyPredicate[source]

Bases: PolyPredicate[Categorization]

GenotypePolyPredicate is a base class for all PolyPredicate that assign an individual into a group based on the genotype.

gpsea.analysis.predicate.genotype.sex_predicate() GenotypePolyPredicate[source]

Get a genotype predicate for categorizing patients by their Sex.

See the Group by sex section for an example.

gpsea.analysis.predicate.genotype.diagnosis_predicate(diagnoses: Iterable[TermId | str], labels: Iterable[str] | None = None) GenotypePolyPredicate[source]

Create a genotype predicate that bins the patient based on presence of a disease diagnosis, as listed in diseases attribute.

If an individual is diagnosed with more than one disease from the provided diagnoses, the individual will be assigned into no group (None).

See the Group by diagnosis section for an example.

Parameters:
  • diagnoses – an iterable with at least 2 disease IDs, either as a str or a TermId to determine the genotype group.

  • labels – an iterable with diagnose names or None if disease IDs should be used instead. The number of labels must match the number of predicates.

gpsea.analysis.predicate.genotype.monoallelic_predicate(a_predicate: VariantPredicate, b_predicate: VariantPredicate | None = None, a_label: str = 'A', b_label: str = 'B') GenotypePolyPredicate[source]

The predicate bins patient into one of two groups, A and B, based on presence of exactly one allele of a variant that meets the predicate criteria.

See Monoallelic predicate for more information and an example usage.

Parameters:
  • a_predicate – predicate to test if the variants meet the criteria of the first group (named A by default).

  • b_predicate – predicate to test if the variants meet the criteria of the second group or None if the inverse of a_predicate should be used (named B by default).

  • a_label – display name of the a_predicate (default "A").

  • b_label – display name of the b_predicate (default "B").

gpsea.analysis.predicate.genotype.biallelic_predicate(a_predicate: VariantPredicate, b_predicate: VariantPredicate | None = None, a_label: str = 'A', b_label: str = 'B', partitions: Collection[Collection[int]] = ((0,), (1,), (2,))) GenotypePolyPredicate[source]

The predicate bins patient into one of the three groups, AA, AB, and BB, based on presence of two variant alleles that meet the predicate criteria.

See Biallelic predicate for more information and an example usage.

Parameters:
  • a_predicate – predicate to test if the variants meet the criteria of the first group (named A by default).

  • b_predicate – predicate to test if the variants meet the criteria of the second group or None if an inverse of a_predicate should be used (named B by default).

  • a_label – display name of the a_predicate (default "A").

  • b_label – display name of the b_predicate (default "B").

  • partitions – a sequence with partition identifiers (default ((0,), (1,), (2,))).

gpsea.analysis.predicate.genotype.allele_count(counts: Collection[Collection[int]], target: VariantPredicate | None = None) GenotypePolyPredicate[source]

Create a predicate to assign the patient into a group based on the allele count of the target variants.

The counts option takes a collection of int collections. Each inner collection corresponds to an allele count partition and the outer collection includes all partitions. The int represents the target allele count. A count can be included only in one partition.

Examples

The following counts will partition the cohort into individuals with zero allele or one target allele:

>>> from gpsea.analysis.predicate.genotype import allele_count
>>> zero_vs_one = allele_count(counts=({0,}, {1,}))
>>> zero_vs_one.summarize_groups()
'Allele count: 0, 1'

These counts will create three groups for individuals with zero, one or two alleles:

>>> zero_vs_one_vs_two = allele_count(counts=({0,}, {1,}, {2,}))
>>> zero_vs_one_vs_two.summarize_groups()
'Allele count: 0, 1, 2'
Parameters:
  • counts – a sequence with allele count partitions.

  • target – a predicate for choosing the variants for testing or None if all variants in the individual should be used.

class gpsea.analysis.predicate.genotype.AlleleCounter(predicate: VariantPredicate)[source]

Bases: object

AlleleCounter counts the number of alleles of all variants that pass the selection with a given predicate.

Parameters:

predicate – a VariantPredicate for selecting the target variants.

get_question() str[source]

Get the question tested by the predicate.

Returns:

the question tested by the predicate

Return type:

str

count(patient: Patient) int[source]

Count the number of alleles of all variants that pass the predicate. :param patient: the patient to test

Returns:

the count of the passing alleles

Return type:

int

class gpsea.analysis.predicate.genotype.VariantPredicate[source]

Bases: Partitioning

VariantPredicate tests if a variant meets a certain criterion.

The subclasses MUST implement all abstract methods of this class plus __eq__ and __hash__, to support building the compound predicates.

We strongly recommend implementing __str__ and __repr__ as well.

get_question() str[source]

Prepare a str with the question the predicate can answer.

abstract test(variant: Variant) bool[source]

Test if the variant meets a criterion.

Parameters:

variant – an instance of Variant to test.

Returns:

True if the variant meets the criterion and False otherwise.

Return type:

bool

class gpsea.analysis.predicate.genotype.VariantPredicates[source]

Bases: object

VariantPredicates is a static utility class to provide the variant predicates that are relatively simple to configure.

static true() VariantPredicate[source]

Prepare an absolutely inclusive VariantPredicate - a predicate that returns True for any variant whatsoever.

static all(predicates: Iterable[VariantPredicate]) VariantPredicate[source]

Prepare a VariantPredicate that returns True if ALL predicates evaluate to True.

This is useful for building compound predicates programmatically.

Example

Build a predicate to test if variant has a functional annotation to genes SURF1 and SURF2:

>>> from gpsea.analysis.predicate.genotype import VariantPredicates
>>> genes = ('SURF1', 'SURF2',)
>>> predicate = VariantPredicates.all(VariantPredicates.gene(g) for g in genes)
>>> predicate.description
'(affects SURF1 AND affects SURF2)'
Parameters:

predicates – an iterable of predicates to test

static any(predicates: Iterable[VariantPredicate]) VariantPredicate[source]

Prepare a VariantPredicate that returns True if ANY of the predicates evaluates to True.

This can be useful for building compound predicates programmatically.

Example

Build a predicate to test if variant leads to a missense or nonsense change on a fictional transcript NM_123456.7:

>>> from gpsea.model import VariantEffect
>>> from gpsea.analysis.predicate.genotype import VariantPredicates
>>> tx_id = 'NM_123456.7'
>>> effects = (VariantEffect.MISSENSE_VARIANT, VariantEffect.STOP_GAINED,)
>>> predicate = VariantPredicates.any(VariantPredicates.variant_effect(e, tx_id) for e in effects)
>>> predicate.description
'(MISSENSE_VARIANT on NM_123456.7 OR STOP_GAINED on NM_123456.7)'
Parameters:

predicates – an iterable of predicates to test

static variant_effect(effect: VariantEffect, tx_id: str) VariantPredicate[source]

Prepare a VariantPredicate to test if the functional annotation predicts the variant to lead to a certain variant effect.

Example

Make a predicate for testing if the variant leads to a missense change on transcript NM_123.4:

>>> from gpsea.model import VariantEffect
>>> from gpsea.analysis.predicate.genotype import VariantPredicates
>>> predicate = VariantPredicates.variant_effect(VariantEffect.MISSENSE_VARIANT, tx_id='NM_123.4')
>>> predicate.description
'MISSENSE_VARIANT on NM_123.4'
Parameters:
  • effect – the target VariantEffect

  • tx_id – a str with the accession ID of the target transcript (e.g. NM_123.4)

static variant_key(key: str) VariantPredicate[source]

Prepare a VariantPredicate that tests if the variant matches the provided key.

Parameters:

key – a str with the variant key (e.g. X_12345_12345_C_G or 22_10001_20000_INV)

static gene(symbol: str) VariantPredicate[source]

Prepare a VariantPredicate that tests if the variant affects a given gene.

Parameters:

symbol – a str with the gene symbol (e.g. 'FBN1').

static transcript(tx_id: str) VariantPredicate[source]

Prepare a VariantPredicate that tests if the variant affects a transcript.

Parameters:

tx_id – a str with the accession ID of the target transcript (e.g. NM_123.4)

static exon(exon: int, tx_id: str) VariantPredicate[source]

Prepare a VariantPredicate that tests if the variant overlaps with an exon of a specific transcript.

Warning

We use 1-based numbering to number the exons, not the usual 0-based numbering of the computer science. Therefore, the first exon of the transcript has exon_number==1, the second exon is 2, and so on …

Warning

We do not check if the exon_number spans beyond the number of exons of the given transcript_id! Therefore, exon_number==10,000 will effectively return False for all variants!!! 😱 Well, at least the genome variants of the Homo sapiens sapiens taxon…

Parameters:
  • exon – a positive int with the index of the target exon (e.g. 1 for the 1st exon, 2 for the 2nd, …)

  • tx_id – a str with the accession ID of the target transcript (e.g. NM_123.4)

static region(region: Tuple[int, int] | Region, tx_id: str) VariantPredicate[source]

Prepare a VariantPredicate that tests if the variant overlaps with a region on a protein of a specific transcript.

Example

Create a predicate to test if the variant overlaps with the 5th aminoacid of the protein encoded by a fictional transcript NM_1234.5:

>>> from gpsea.analysis.predicate.genotype import VariantPredicates
>>> overlaps_with_fifth_aa = VariantPredicates.region(region=(5, 5), tx_id="NM_1234.5")
>>> overlaps_with_fifth_aa.description
'overlaps with [5,5] region of the protein encoded by NM_1234.5'

Create a predicate to test if the variant Overlaps with the first 20 aminoacid residues of the same transcript:

>>> overlaps_with_first_20 = VariantPredicates.region(region=(1, 20), tx_id="NM_1234.5")
>>> overlaps_with_first_20.description
'overlaps with [1,20] region of the protein encoded by NM_1234.5'
Parameters:

region – a Region that gives the start and end coordinate of the region of interest on a protein strand or a tuple with 1-based coordinates.

static is_large_imprecise_sv() VariantPredicate[source]

Prepare a VariantPredicate for testing if the variant is a large structural variant (SV) without exact breakpoint coordinates.

static is_structural_variant(threshold: int = 50) VariantPredicate[source]

Prepare a VariantPredicate for testing if the variant is a structural variant (SV).

SVs are usually defined as variant affecting more than a certain number of base pairs. The thresholds vary in the literature, but here we use 50bp as a default.

Any variant that affects at least threshold base pairs is considered an SV. Large SVs with unknown breakpoint coordinates or translocations (TRANSLOCATION) are always considered as an SV.

Parameters:

threshold – a non-negative int with the number of base pairs that must be affected

static structural_type(curie: str | TermId) VariantPredicate[source]

Prepare a VariantPredicate for testing if the variant has a certain structural type.

We recommend using a descendant of structural_variant (SO:0001537) as the structural type.

Example

Make a predicate for testing if the variant is a chromosomal deletion (SO:1000029):

>>> from gpsea.analysis.predicate.genotype import VariantPredicates
>>> predicate = VariantPredicates.structural_type('SO:1000029')
>>> predicate.description
'structural type is SO:1000029'
Parameters:

curie – compact uniform resource identifier (CURIE) with the structural type to test.

static variant_class(variant_class: VariantClass) VariantPredicate[source]

Prepare a VariantPredicate for testing if the variant is of a certain VariantClass.

Example

Make a predicate to test if the variant is a deletion:

>>> from gpsea.model import VariantClass
>>> from gpsea.analysis.predicate.genotype import VariantPredicates
>>> predicate = VariantPredicates.variant_class(VariantClass.DEL)
>>> predicate.description
'variant class is DEL'
Parameters:

variant_class – the variant class to test.

static ref_length(operator: Literal['<', '<=', '==', '!=', '>=', '>'], length: int) VariantPredicate[source]

Prepare a VariantPredicate for testing if the reference (REF) allele of variant is above, below, or (not) equal to certain length.

See also

See Length of the reference allele for more info.

Example

Prepare a predicate that tests that the REF allele includes more than 5 base pairs:

>>> from gpsea.analysis.predicate.genotype import VariantPredicates
>>> predicate = VariantPredicates.ref_length('>', 5)
>>> predicate.description
'reference allele length > 5'
Parameters:
  • operator – a str with the desired test. Must be one of { '<', '<=', '==', '!=', '>=', '>' }.

  • length – a non-negative int with the length threshold.

static change_length(operator: Literal['<', '<=', '==', '!=', '>=', '>'], threshold: int) VariantPredicate[source]

Prepare a VariantPredicate for testing if the variant’s change length is above, below, or (not) equal to certain threshold.

See also

See Change length of an allele for more info.

Example

Make a predicate for testing if the change length is less than or equal to -10, e.g. to test if a variant is a deletion leading to removal of at least 10 base pairs:

>>> from gpsea.analysis.predicate.genotype import VariantPredicates
>>> predicate = VariantPredicates.change_length('<=', -10)
>>> predicate.description
'change length <= -10'
Parameters:
  • operator – a str with the desired test. Must be one of { '<', '<=', '==', '!=', '>=', '>' }.

  • threshold – an int with the threshold. Can be negative, zero, or positive.

static is_structural_deletion(threshold: int = -50) VariantPredicate[source]

Prepare a VariantPredicate for testing if the variant is a chromosomal deletion or a structural variant deletion that leads to removal of at least n base pairs (50bp by default).

Note

The predicate uses change_length() to determine if the length of the variant is above or below threshold.

IMPORTANT: the change lengths of deletions are negative, since the alternate allele is shorter than the reference allele. See Change length of an allele for more info.

Example

Prepare a predicate for testing if the variant is a chromosomal deletion that removes at least 20 base pairs:

>>> from gpsea.analysis.predicate.genotype import VariantPredicates
>>> predicate = VariantPredicates.is_structural_deletion(-20)
>>> predicate.description
'(structural type is SO:1000029 OR (variant class is DEL AND change length <= -20))'
Parameters:

threshold – an int with the change length threshold to determine if a variant is “structural” (-50 bp by default).

static protein_feature_type(feature_type: FeatureType, protein_metadata: ProteinMetadata) VariantPredicate[source]

Prepare a VariantPredicate to test if the variant affects a feature_type of a protein.

Parameters:
  • feature_type – the target protein FeatureType (e.g. DOMAIN).

  • protein_metadata – the information about the protein.

static protein_feature(feature_id: str, protein_metadata: ProteinMetadata) VariantPredicate[source]

Prepare a VariantPredicate to test if the variant affects a protein feature labeled with the provided feature_id.

Parameters:
  • feature_id – the id of the target protein feature (e.g. ANK 1)

  • protein_metadata – the information about the protein.