gpsea.analysis.predicate.genotype package

class gpsea.analysis.predicate.genotype.GenotypePolyPredicate[source]

Bases: PolyPredicate[Categorization]

GenotypePolyPredicate is a base class for all PolyPredicate that test the genotype axis.

gpsea.analysis.predicate.genotype.groups_predicate(predicates: Iterable[VariantPredicate], group_names: Iterable[str]) GenotypePolyPredicate[source]

Create a genotype predicate that bins the patient into one of n groups.

The genotype groups should not overlap. In case of an overlap, the patient will be assigned into no group (None).

See the Groups Predicate section for an example.

Parameters:
  • predicates – an iterable with at least 2 variant predicates to determine a genotype group.

  • group_names – an iterable with group names. The number of group names must match the number of predicates.

gpsea.analysis.predicate.genotype.sex_predicate() GenotypePolyPredicate[source]

Get a genotype predicate for categorizing patients by their Sex.

See the Partition by the sex of the individual section for an example.

gpsea.analysis.predicate.genotype.diagnosis_predicate(diagnoses: Iterable[TermId | str], labels: Iterable[str] | None = None) GenotypePolyPredicate[source]

Create a genotype predicate that bins the patient based on presence of a disease diagnosis, as listed in diseases attribute.

If an individual is diagnosed with more than one disease from the provided diagnoses, the individual will be assigned into no group (None).

See the Partition by a diagnosis section for an example.

Parameters:
  • diagnoses – an iterable with at least 2 diagnose IDs, either as a str or a TermId to determine the genotype group.

  • labels – an iterable with diagnose names or None if CURIEs should be used instead. The number of labels must match the number of predicates.

gpsea.analysis.predicate.genotype.autosomal_dominant(variant_predicate: VariantPredicate | None = None) GenotypePolyPredicate[source]

Create a predicate that assigns the patient either into homozygous reference or heterozygous group in line with the autosomal dominant mode of inheritance.

Parameters:

variant_predicate – a predicate for choosing the variants for testing or None if all variants should be used.

gpsea.analysis.predicate.genotype.autosomal_recessive(variant_predicate: VariantPredicate | None = None) GenotypePolyPredicate[source]

Create a predicate that assigns the patient either into homozygous reference, heterozygous, or biallelic alternative allele (homozygous alternative or compound heterozygous) group in line with the autosomal recessive mode of inheritance.

Parameters:

variant_predicate – a predicate for choosing the variants for testing or None if all variants should be used

gpsea.analysis.predicate.genotype.monoallelic_predicate(a_predicate: VariantPredicate, b_predicate: VariantPredicate, names: Tuple[str, str] = ('A', 'B')) GenotypePolyPredicate[source]

The predicate bins patient into one of two groups, A and B, based on presence of exactly one allele of a variant that meets the predicate criteria.

The number of alleles \(count_{A}\) and \(count_{B}\) is computed using a_predicate and b_predicate and the individual is assigned into a group based on the following table:

Group

\(count_{A}\)

\(count_{B}\)

A

1

0

B

0

1

The individuals with different allele counts (e.g. \(count_{A} = 0\) and \(count_{B} = 2\)) are assigned into the None group and, thus, omitted from the analysis.

Parameters:
  • a_predicate – predicate to test if the variants meet the criteria of the first group (named A by default).

  • b_predicate – predicate to test if the variants meet the criteria of the second group (named B by default).

  • names – group names (default ('A', 'B')).

gpsea.analysis.predicate.genotype.biallelic_predicate(a_predicate: VariantPredicate, b_predicate: VariantPredicate, names: Tuple[str, str] = ('A', 'B')) GenotypePolyPredicate[source]

The predicate bins patient into one of the three groups, AA, AB, and BB, based on presence of one or two variant alleles that meet the predicate criteria.

The number of alleles \(count_{A}\) and \(count_{B}\) is computed using a_predicate and b_predicate and the individual is assigned into a group based on the following table:

Group

\(count_{A}\)

\(count_{B}\)

AA

2

0

AB

1

1

AA

0

2

The individuals with different allele counts (e.g. \(count_{A} = 1\) and \(count_{B} = 2\)) are assigned into the None group and will be, thus, omitted from the analysis.

Parameters:
  • a_predicate – predicate to test if the variants meet the criteria of the first group (named A by default).

  • b_predicate – predicate to test if the variants meet the criteria of the second group (named B by default).

  • names – group names (default ('A', 'B')).

class gpsea.analysis.predicate.genotype.ModeOfInheritancePredicate(allele_counter: AlleleCounter, mode_of_inheritance_info: ModeOfInheritanceInfo)[source]

Bases: GenotypePolyPredicate

ModeOfInheritancePredicate assigns an individual into a group based on compatibility with the selected mode of inheritance.

static autosomal_dominant(variant_predicate: VariantPredicate | None = None) GenotypePolyPredicate[source]

Create a predicate that assigns the patient either into homozygous reference or heterozygous group in line with the autosomal dominant mode of inheritance.

Parameters:

variant_predicate – a predicate for choosing the variants for testing.

static autosomal_recessive(variant_predicate: VariantPredicate | None = None) GenotypePolyPredicate[source]

Create a predicate that assigns the patient either into homozygous reference, heterozygous, or biallelic alternative allele (homozygous alternative or compound heterozygous) group in line with the autosomal recessive mode of inheritance.

Parameters:

variant_predicate – a predicate for choosing the variants for testing.

get_categorizations() Sequence[Categorization][source]

Get a sequence of all categories which the PolyPredicate can produce.

get_question_base() str[source]

Prepare a str with the question the predicate can answer.

test(patient: Patient) Categorization | None[source]

Assign a patient into a categorization.

Return None if the patient cannot be assigned into any meaningful category.

class gpsea.analysis.predicate.genotype.AlleleCounter(predicate: VariantPredicate)[source]

Bases: object

AlleleCounter counts the number of alleles of all variants that pass the selection with a given predicate.

Parameters:

predicate – a VariantPredicate for selecting the target variants.

get_question() str[source]

Get the question tested by the predicate.

Returns:

the question tested by the predicate

Return type:

str

count(patient: Patient) int[source]

Count the number of alleles of all variants that pass the predicate. :param patient: the patient to test

Returns:

the count of the passing alleles

Return type:

int

class gpsea.analysis.predicate.genotype.VariantPredicate[source]

Bases: object

VariantPredicate tests if a variant meets a certain criterion.

The subclasses are expected to implement all abstract methods of this class plus __eq__ and __hash__, to support building of compound predicates.

We strongly recommend implementing __str__ and __repr__ as well.

abstract get_question() str[source]

Prepare a str with the question the predicate can answer.

abstract test(variant: Variant) bool[source]

Test if the variant meets a criterion.

Parameters:

variant – an instance of Variant to test.

Returns:

True if the variant meets the criterion and False otherwise.

Return type:

bool

class gpsea.analysis.predicate.genotype.VariantPredicates[source]

Bases: object

VariantPredicates is a static utility class to provide the variant predicates that are relatively simple to configure.

static true() VariantPredicate[source]

Prepare an absolutely inclusive VariantPredicate - a predicate that returns True for any variant whatsoever.

static all(predicates: Iterable[VariantPredicate]) VariantPredicate[source]

Prepare a VariantPredicate that returns True if ALL predicates evaluate to True.

This is useful for building compound predicates programmatically.

Example

Build a predicate to test if variant has a functional annotation to genes SURF1 and SURF2:

>>> from gpsea.analysis.predicate.genotype import VariantPredicates
>>> genes = ('SURF1', 'SURF2',)
>>> predicate = VariantPredicates.all(VariantPredicates.gene(g) for g in genes)
>>> predicate.get_question()
'(impacts SURF1 AND impacts SURF2)'
Parameters:

predicates – an iterable of predicates to test

static any(predicates: Iterable[VariantPredicate]) VariantPredicate[source]

Prepare a VariantPredicate that returns True if ANY of the predicates evaluates to True.

This can be useful for building compound predicates programmatically.

Example

Build a predicate to test if variant leads to a missense or nonsense change on a fictional transcript NM_123456.7:

>>> from gpsea.model import VariantEffect
>>> from gpsea.analysis.predicate.genotype import VariantPredicates
>>> tx_id = 'NM_123456.7'
>>> effects = (VariantEffect.MISSENSE_VARIANT, VariantEffect.STOP_GAINED,)
>>> predicate = VariantPredicates.any(VariantPredicates.variant_effect(e, tx_id) for e in effects)
>>> predicate.get_question()
'(MISSENSE_VARIANT on NM_123456.7 OR STOP_GAINED on NM_123456.7)'
Parameters:

predicates – an iterable of predicates to test

static variant_effect(effect: VariantEffect, tx_id: str) VariantPredicate[source]

Prepare a VariantPredicate to test if the functional annotation predicts the variant to lead to a certain variant effect.

Example

Make a predicate for testing if the variant leads to a missense change on transcript NM_123.4:

>>> from gpsea.model import VariantEffect
>>> from gpsea.analysis.predicate.genotype import VariantPredicates
>>> predicate = VariantPredicates.variant_effect(VariantEffect.MISSENSE_VARIANT, tx_id='NM_123.4')
>>> predicate.get_question()
'MISSENSE_VARIANT on NM_123.4'
Parameters:
  • effect – the target VariantEffect

  • tx_id – a str with the accession ID of the target transcript (e.g. NM_123.4)

static variant_key(key: str) VariantPredicate[source]

Prepare a VariantPredicate that tests if the variant matches the provided key.

Parameters:

key – a str with the variant key (e.g. X_12345_12345_C_G or 22_10001_20000_INV)

static gene(symbol: str) VariantPredicate[source]

Prepare a VariantPredicate that tests if the variant affects a given gene.

Parameters:

symbol – a str with the gene symbol (e.g. 'FBN1').

static transcript(tx_id: str) VariantPredicate[source]

Prepare a VariantPredicate that tests if the variant affects a transcript.

Parameters:

tx_id – a str with the accession ID of the target transcript (e.g. NM_123.4)

static exon(exon: int, tx_id: str) VariantPredicate[source]

Prepare a VariantPredicate that tests if the variant overlaps with an exon of a specific transcript.

Warning

We use 1-based numbering to number the exons, not the usual 0-based numbering of the computer science. Therefore, the first exon of the transcript has exon_number==1, the second exon is 2, and so on …

Warning

We do not check if the exon_number spans beyond the number of exons of the given transcript_id! Therefore, exon_number==10,000 will effectively return False for all variants!!! 😱 Well, at least the genome variants of the Homo sapiens sapiens taxon…

Parameters:
  • exon – a non-negative int with the index of the target exon (e.g. 0 for the 1st exon, 1 for the 2nd, …)

  • tx_id – a str with the accession ID of the target transcript (e.g. NM_123.4)

static region(region: Region, tx_id: str) VariantPredicate[source]

Prepare a VariantPredicate that tests if the variant overlaps with a region on a protein of a specific transcript.

Parameters:

region – a Region that gives the start and end coordinate of the region of interest on a protein strand.

static is_large_imprecise_sv() VariantPredicate[source]

Prepare a VariantPredicate for testing if the variant is a large structural variant (SV) without exact breakpoint coordinates.

static is_structural_variant(threshold: int = 50) VariantPredicate[source]

Prepare a VariantPredicate for testing if the variant is a structural variant (SV).

SVs are usually defined as variant affecting more than a certain number of base pairs. The thresholds vary in the literature, but here we use 50bp as a default.

Any variant that affects at least threshold base pairs is considered an SV. Large SVs with unknown breakpoint coordinates or translocations (VariantClass.BND) are always considered as an SV.

Parameters:

threshold – a non-negative int with the number of base pairs that must be affected

static structural_type(curie: str | TermId) VariantPredicate[source]

Prepare a VariantPredicate for testing if the variant has a certain structural type.

We recommend using a descendant of structural_variant (SO:0001537) as the structural type.

Example

Make a predicate for testing if the variant is a chromosomal deletion (SO:1000029):

>>> from gpsea.analysis.predicate.genotype import VariantPredicates
>>> predicate = VariantPredicates.structural_type('SO:1000029')
>>> predicate.get_question()
'structural type is SO:1000029'
Parameters:

curie – compact uniform resource identifier (CURIE) with the structural type to test.

static variant_class(variant_class: VariantClass) VariantPredicate[source]

Prepare a VariantPredicate for testing if the variant is of a certain VariantClass.

Example

Make a predicate to test if the variant is a deletion:

>>> from gpsea.model import VariantClass
>>> from gpsea.analysis.predicate.genotype import VariantPredicates
>>> predicate = VariantPredicates.variant_class(VariantClass.DEL)
>>> predicate.get_question()
'variant class is DEL'
Parameters:

variant_class – the variant class to test.

static ref_length(operator: Literal['<', '<=', '==', '!=', '>=', '>'], length: int) VariantPredicate[source]

Prepare a VariantPredicate for testing if the reference (REF) allele of variant is above, below, or (not) equal to certain length.

See also

See Length of the reference allele for more info.

Example

Prepare a predicate that tests that the REF allele includes more than 5 base pairs:

>>> from gpsea.analysis.predicate.genotype import VariantPredicates
>>> predicate = VariantPredicates.ref_length('>', 5)
>>> predicate.get_question()
'ref allele length > 5'
Parameters:
  • operator – a str with the desired test. Must be one of { '<', '<=', '==', '!=', '>=', '>' }.

  • length – a non-negative int with the length threshold.

static change_length(operator: Literal['<', '<=', '==', '!=', '>=', '>'], threshold: int) VariantPredicate[source]

Prepare a VariantPredicate for testing if the variant’s change length is above, below, or (not) equal to certain threshold.

See also

See Change length of an allele for more info.

Example

Make a predicate for testing if the change length is less than or equal to -10, e.g. to test if a variant is a deletion leading to removal of at least 10 base pairs:

>>> from gpsea.analysis.predicate.genotype import VariantPredicates
>>> predicate = VariantPredicates.change_length('<=', -10)
>>> predicate.get_question()
'change length <= -10'
Parameters:
  • operator – a str with the desired test. Must be one of { '<', '<=', '==', '!=', '>=', '>' }.

  • threshold – an int with the threshold. Can be negative, zero, or positive.

static is_structural_deletion(threshold: int = -50) VariantPredicate[source]

Prepare a VariantPredicate for testing if the variant is a chromosomal deletion or a structural variant deletion that leads to removal of at least n base pairs (50bp by default).

Note

The predicate uses change_length() to determine if the length of the variant is above or below threshold.

IMPORTANT: the change lengths of deletions are negative, since the alternate allele is shorter than the reference allele. See Change length of an allele for more info.

Example

Prepare a predicate for testing if the variant is a chromosomal deletion that removes at least 20 base pairs:

>>> from gpsea.analysis.predicate.genotype import VariantPredicates
>>> predicate = VariantPredicates.is_structural_deletion(-20)
>>> predicate.get_question()
'(structural type is SO:1000029 OR (variant class is DEL AND change length <= -20))'
Parameters:

threshold – an int with the change length threshold to determine if a variant is “structural” (-50 bp by default).

class gpsea.analysis.predicate.genotype.ProteinPredicates(protein_metadata_service: ProteinMetadataService)[source]

Bases: object

ProteinPredicates prepares variant predicates that need to consult ProteinMetadataService to categorize a Variant.

protein_feature_type(feature_type: FeatureType, tx_id: str) VariantPredicate[source]

Prepare a VariantPredicate that tests if the variant affects a protein feature type.

Parameters:

feature_type – the target protein FeatureType (e.g. FeatureType.DOMAIN)

protein_feature(feature_id: str, tx_id: str) VariantPredicate[source]

Prepare a VariantPredicate that tests if the variant affects a protein feature type.

Parameters:
  • feature_id – the id of the target protein feature (e.g. ANK 1)

  • tx_id – a str with the accession ID of the target transcript (e.g. NM_123.4)