gpsea.analysis.predicate package

class gpsea.analysis.predicate.VariantPredicate[source]

Bases: Partitioning

VariantPredicate tests if a variant meets a certain criterion.

The subclasses MUST implement all abstract methods of this class plus __eq__ and __hash__, to support building the compound predicates.

We strongly recommend implementing __str__ and __repr__ as well.

get_question() str[source]

Prepare a str with the question the predicate can answer.

abstract test(variant: Variant) bool[source]

Test if the variant meets a criterion.

Parameters:

variant – an instance of Variant to test.

Returns:

True if the variant meets the criterion and False otherwise.

Return type:

bool

gpsea.analysis.predicate.true() VariantPredicate[source]

The most inclusive variant predicate - returns True for any variant whatsoever.

gpsea.analysis.predicate.allof(predicates: Iterable[VariantPredicate]) VariantPredicate[source]

Prepare a VariantPredicate that returns True if ALL predicates evaluate to True.

This is useful for building compound predicates programmatically.

Example

Build a predicate to test if variant has a functional annotation to genes SURF1 and SURF2:

>>> from gpsea.analysis.predicate import allof, gene
>>> genes = ('SURF1', 'SURF2',)
>>> predicate = allof(gene(g) for g in genes)
>>> predicate.description
'(affects SURF1 AND affects SURF2)'
Parameters:

predicates – an iterable of predicates to test

gpsea.analysis.predicate.anyof(predicates: Iterable[VariantPredicate]) VariantPredicate[source]

Prepare a VariantPredicate that returns True if ANY of the predicates evaluates to True.

This can be useful for building compound predicates programmatically.

Example

Build a predicate to test if variant leads to a missense or nonsense change on a fictional transcript NM_123456.7:

>>> from gpsea.model import VariantEffect
>>> from gpsea.analysis.predicate import anyof, variant_effect
>>> tx_id = 'NM_123456.7'
>>> effects = (VariantEffect.MISSENSE_VARIANT, VariantEffect.STOP_GAINED,)
>>> predicate = anyof(variant_effect(e, tx_id) for e in effects)
>>> predicate.description
'(MISSENSE_VARIANT on NM_123456.7 OR STOP_GAINED on NM_123456.7)'
Parameters:

predicates – an iterable of predicates to test

gpsea.analysis.predicate.variant_effect(effect: VariantEffect, tx_id: str) VariantPredicate[source]

Prepare a VariantPredicate to test if the functional annotation predicts the variant to lead to a certain variant effect.

Example

Make a predicate for testing if the variant leads to a missense change on transcript NM_123.4:

>>> from gpsea.model import VariantEffect
>>> from gpsea.analysis.predicate import variant_effect
>>> predicate = variant_effect(VariantEffect.MISSENSE_VARIANT, tx_id='NM_123.4')
>>> predicate.description
'MISSENSE_VARIANT on NM_123.4'
Parameters:
  • effect – the target VariantEffect

  • tx_id – a str with the accession ID of the target transcript (e.g. NM_123.4)

gpsea.analysis.predicate.variant_key(key: str) VariantPredicate[source]

Prepare a VariantPredicate that tests if the variant matches the provided key.

Parameters:

key – a str with the variant key (e.g. X_12345_12345_C_G or 22_10001_20000_INV)

gpsea.analysis.predicate.gene(symbol: str) VariantPredicate[source]

Prepare a VariantPredicate that tests if the variant affects a given gene.

Parameters:

symbol – a str with the gene symbol (e.g. 'FBN1').

gpsea.analysis.predicate.transcript(tx_id: str) VariantPredicate[source]

Prepare a VariantPredicate that tests if the variant affects a transcript.

Parameters:

tx_id – a str with the accession ID of the target transcript (e.g. NM_123.4)

gpsea.analysis.predicate.exon(exon: int, tx_id: str) VariantPredicate[source]

Prepare a VariantPredicate that tests if the variant overlaps with an exon of a specific transcript.

Warning

We use 1-based numbering to number the exons, not the usual 0-based numbering of the computer science. Therefore, the first exon of the transcript has exon_number==1, the second exon is 2, and so on …

Warning

We do not check if the exon_number spans beyond the number of exons of the given transcript_id! Therefore, exon_number==10,000 will effectively return False for all variants!!! 😱 Well, at least the genome variants of the Homo sapiens sapiens taxon…

Parameters:
  • exon – a positive int with the index of the target exon (e.g. 1 for the 1st exon, 2 for the 2nd, …)

  • tx_id – a str with the accession ID of the target transcript (e.g. NM_123.4)

gpsea.analysis.predicate.protein_region(region: Tuple[int, int] | Region, tx_id: str) VariantPredicate[source]

Prepare a VariantPredicate that tests if the variant overlaps with a region on a protein of a specific transcript.

Example

Create a predicate to test if the variant overlaps with the 5th aminoacid of the protein encoded by a fictional transcript NM_1234.5:

>>> from gpsea.analysis.predicate import protein_region
>>> overlaps_with_fifth_aa = protein_region(region=(5, 5), tx_id="NM_1234.5")
>>> overlaps_with_fifth_aa.description
'overlaps with [5,5] region of the protein encoded by NM_1234.5'

Create a predicate to test if the variant overlaps with the first 20 aminoacid residues of the same transcript:

>>> overlaps_with_first_20 = protein_region(region=(1, 20), tx_id="NM_1234.5")
>>> overlaps_with_first_20.description
'overlaps with [1,20] region of the protein encoded by NM_1234.5'
Parameters:

region – a Region that gives the start and end coordinate of the region of interest on a protein strand or a tuple with 1-based coordinates.

gpsea.analysis.predicate.is_large_imprecise_sv() VariantPredicate[source]

Prepare a VariantPredicate for testing if the variant is a large structural variant (SV) without exact breakpoint coordinates.

gpsea.analysis.predicate.is_structural_variant(threshold: int = 50) VariantPredicate[source]

Prepare a VariantPredicate for testing if the variant is a structural variant (SV).

SVs are usually defined as variant affecting more than a certain number of base pairs. The thresholds vary in the literature, but here we use 50bp as a default.

Any variant that affects at least threshold base pairs is considered an SV. Large SVs with unknown breakpoint coordinates or translocations (TRANSLOCATION) are always considered as an SV.

Parameters:

threshold – a non-negative int with the number of base pairs that must be affected

gpsea.analysis.predicate.structural_type(curie: str | TermId) VariantPredicate[source]

Prepare a VariantPredicate for testing if the variant has a certain structural type.

We recommend using a descendant of structural_variant (SO:0001537) as the structural type.

Example

Make a predicate for testing if the variant is a chromosomal deletion (SO:1000029):

>>> from gpsea.analysis.predicate import structural_type
>>> predicate = structural_type('SO:1000029')
>>> predicate.description
'structural type is SO:1000029'
Parameters:

curie – compact uniform resource identifier (CURIE) with the structural type to test.

gpsea.analysis.predicate.variant_class(variant_class: VariantClass) VariantPredicate[source]

Prepare a VariantPredicate for testing if the variant is of a certain VariantClass.

Example

Make a predicate to test if the variant is a deletion:

>>> from gpsea.model import VariantClass
>>> from gpsea.analysis.predicate import variant_class
>>> predicate = variant_class(VariantClass.DEL)
>>> predicate.description
'variant class is DEL'
Parameters:

variant_class – the variant class to test.

gpsea.analysis.predicate.ref_length(operator: Literal['<', '<=', '==', '!=', '>=', '>'], length: int) VariantPredicate[source]

Prepare a VariantPredicate for testing if the reference (REF) allele of variant is above, below, or (not) equal to certain length.

See also

See Length of the reference allele for more info.

Example

Prepare a predicate that tests that the REF allele includes more than 5 base pairs:

>>> from gpsea.analysis.predicate import ref_length
>>> predicate = ref_length('>', 5)
>>> predicate.description
'reference allele length > 5'
Parameters:
  • operator – a str with the desired test. Must be one of { '<', '<=', '==', '!=', '>=', '>' }.

  • length – a non-negative int with the length threshold.

gpsea.analysis.predicate.change_length(operator: Literal['<', '<=', '==', '!=', '>=', '>'], threshold: int) VariantPredicate[source]

Prepare a VariantPredicate for testing if the variant’s change length is above, below, or (not) equal to certain threshold.

See also

See Change length of an allele for more info.

Example

Make a predicate for testing if the change length is less than or equal to -10, e.g. to test if a variant is a deletion leading to removal of at least 10 base pairs:

>>> from gpsea.analysis.predicate import change_length
>>> predicate = change_length('<=', -10)
>>> predicate.description
'change length <= -10'
Parameters:
  • operator – a str with the desired test. Must be one of { '<', '<=', '==', '!=', '>=', '>' }.

  • threshold – an int with the threshold. Can be negative, zero, or positive.

gpsea.analysis.predicate.is_structural_deletion(threshold: int = -50) VariantPredicate[source]

Prepare a VariantPredicate for testing if the variant is a chromosomal deletion or a structural variant deletion that leads to removal of at least n base pairs (50bp by default).

Note

The predicate uses change_length() to determine if the length of the variant is above or below threshold.

IMPORTANT: the change lengths of deletions are negative, since the alternate allele is shorter than the reference allele. See Change length of an allele for more info.

Example

Prepare a predicate for testing if the variant is a chromosomal deletion that removes at least 20 base pairs:

>>> from gpsea.analysis.predicate import is_structural_deletion
>>> predicate = is_structural_deletion(-20)
>>> predicate.description
'(structural type is SO:1000029 OR (variant class is DEL AND change length <= -20))'
Parameters:

threshold – an int with the change length threshold to determine if a variant is “structural” (-50 bp by default).

gpsea.analysis.predicate.protein_feature_type(feature_type: FeatureType | str, protein_metadata: ProteinMetadata) VariantPredicate[source]

Prepare a VariantPredicate to test if the variant affects a feature_type of a protein.

Parameters:
  • feature_type – the target protein FeatureType (e.g. DOMAIN).

  • protein_metadata – the information about the protein.

gpsea.analysis.predicate.protein_feature(feature_id: str, protein_metadata: ProteinMetadata) VariantPredicate[source]

Prepare a VariantPredicate to test if the variant affects a protein feature labeled with the provided feature_id.

Parameters:
  • feature_id – the id of the target protein feature (e.g. ANK 1)

  • protein_metadata – the information about the protein.