gpsea.analysis.predicate.genotype package
- class gpsea.analysis.predicate.genotype.GenotypePolyPredicate[source]
Bases:
PolyPredicate
[Categorization
]GenotypePolyPredicate is a base class for all
PolyPredicate
that test the genotype axis.
- gpsea.analysis.predicate.genotype.groups_predicate(predicates: Iterable[VariantPredicate], group_names: Iterable[str]) GenotypePolyPredicate [source]
Create a genotype predicate that bins the patient into one of n groups.
The genotype groups should not overlap. In case of an overlap, the patient will be assigned into no group (None).
See the Groups Predicate section for an example.
- Parameters:
predicates – an iterable with at least 2 variant predicates to determine a genotype group.
group_names – an iterable with group names. The number of group names must match the number of predicates.
- gpsea.analysis.predicate.genotype.sex_predicate() GenotypePolyPredicate [source]
Get a genotype predicate for categorizing patients by their
Sex
.See the Partition by the sex of the individual section for an example.
- gpsea.analysis.predicate.genotype.diagnosis_predicate(diagnoses: Iterable[TermId | str], labels: Iterable[str] | None = None) GenotypePolyPredicate [source]
Create a genotype predicate that bins the patient based on presence of a disease diagnosis, as listed in
diseases
attribute.If an individual is diagnosed with more than one disease from the provided diagnoses, the individual will be assigned into no group (None).
See the Partition by a diagnosis section for an example.
- Parameters:
diagnoses – an iterable with at least 2 diagnose IDs, either as a str or a
TermId
to determine the genotype group.labels – an iterable with diagnose names or None if CURIEs should be used instead. The number of labels must match the number of predicates.
- gpsea.analysis.predicate.genotype.autosomal_dominant(variant_predicate: VariantPredicate | None = None) GenotypePolyPredicate [source]
Create a predicate that assigns the patient either into homozygous reference or heterozygous group in line with the autosomal dominant mode of inheritance.
- Parameters:
variant_predicate – a predicate for choosing the variants for testing or None if all variants should be used.
- gpsea.analysis.predicate.genotype.autosomal_recessive(variant_predicate: VariantPredicate | None = None) GenotypePolyPredicate [source]
Create a predicate that assigns the patient either into homozygous reference, heterozygous, or biallelic alternative allele (homozygous alternative or compound heterozygous) group in line with the autosomal recessive mode of inheritance.
- Parameters:
variant_predicate – a predicate for choosing the variants for testing or None if all variants should be used
- gpsea.analysis.predicate.genotype.monoallelic_predicate(a_predicate: VariantPredicate, b_predicate: VariantPredicate, names: Tuple[str, str] = ('A', 'B')) GenotypePolyPredicate [source]
The predicate bins patient into one of two groups, A and B, based on presence of exactly one allele of a variant that meets the predicate criteria.
The number of alleles \(count_{A}\) and \(count_{B}\) is computed using a_predicate and b_predicate and the individual is assigned into a group based on the following table:
Group
\(count_{A}\)
\(count_{B}\)
A
1
0
B
0
1
The individuals with different allele counts (e.g. \(count_{A} = 0\) and \(count_{B} = 2\)) are assigned into the
None
group and, thus, omitted from the analysis.- Parameters:
a_predicate – predicate to test if the variants meet the criteria of the first group (named A by default).
b_predicate – predicate to test if the variants meet the criteria of the second group (named B by default).
names – group names (default
('A', 'B')
).
- gpsea.analysis.predicate.genotype.biallelic_predicate(a_predicate: VariantPredicate, b_predicate: VariantPredicate, names: Tuple[str, str] = ('A', 'B')) GenotypePolyPredicate [source]
The predicate bins patient into one of the three groups, AA, AB, and BB, based on presence of one or two variant alleles that meet the predicate criteria.
The number of alleles \(count_{A}\) and \(count_{B}\) is computed using a_predicate and b_predicate and the individual is assigned into a group based on the following table:
Group
\(count_{A}\)
\(count_{B}\)
AA
2
0
AB
1
1
AA
0
2
The individuals with different allele counts (e.g. \(count_{A} = 1\) and \(count_{B} = 2\)) are assigned into the
None
group and will be, thus, omitted from the analysis.- Parameters:
a_predicate – predicate to test if the variants meet the criteria of the first group (named A by default).
b_predicate – predicate to test if the variants meet the criteria of the second group (named B by default).
names – group names (default
('A', 'B')
).
- class gpsea.analysis.predicate.genotype.ModeOfInheritancePredicate(allele_counter: AlleleCounter, mode_of_inheritance_info: ModeOfInheritanceInfo)[source]
Bases:
GenotypePolyPredicate
ModeOfInheritancePredicate assigns an individual into a group based on compatibility with the selected mode of inheritance.
- static autosomal_dominant(variant_predicate: VariantPredicate | None = None) GenotypePolyPredicate [source]
Create a predicate that assigns the patient either into homozygous reference or heterozygous group in line with the autosomal dominant mode of inheritance.
- Parameters:
variant_predicate – a predicate for choosing the variants for testing.
- static autosomal_recessive(variant_predicate: VariantPredicate | None = None) GenotypePolyPredicate [source]
Create a predicate that assigns the patient either into homozygous reference, heterozygous, or biallelic alternative allele (homozygous alternative or compound heterozygous) group in line with the autosomal recessive mode of inheritance.
- Parameters:
variant_predicate – a predicate for choosing the variants for testing.
- get_categorizations() Sequence[Categorization] [source]
Get a sequence of all categories which the PolyPredicate can produce.
- test(patient: Patient) Categorization | None [source]
Assign a patient into a categorization.
Return None if the patient cannot be assigned into any meaningful category.
- class gpsea.analysis.predicate.genotype.AlleleCounter(predicate: VariantPredicate)[source]
Bases:
object
AlleleCounter counts the number of alleles of all variants that pass the selection with a given predicate.
- Parameters:
predicate – a
VariantPredicate
for selecting the target variants.
- class gpsea.analysis.predicate.genotype.VariantPredicate[source]
Bases:
object
VariantPredicate tests if a variant meets a certain criterion.
The subclasses are expected to implement all abstract methods of this class plus
__eq__
and__hash__
, to support building of compound predicates.We strongly recommend implementing
__str__
and__repr__
as well.
- class gpsea.analysis.predicate.genotype.VariantPredicates[source]
Bases:
object
VariantPredicates is a static utility class to provide the variant predicates that are relatively simple to configure.
- static true() VariantPredicate [source]
Prepare an absolutely inclusive
VariantPredicate
- a predicate that returns True for any variant whatsoever.
- static all(predicates: Iterable[VariantPredicate]) VariantPredicate [source]
Prepare a
VariantPredicate
that returns True if ALL predicates evaluate to True.This is useful for building compound predicates programmatically.
Example
Build a predicate to test if variant has a functional annotation to genes SURF1 and SURF2:
>>> from gpsea.analysis.predicate.genotype import VariantPredicates
>>> genes = ('SURF1', 'SURF2',) >>> predicate = VariantPredicates.all(VariantPredicates.gene(g) for g in genes) >>> predicate.get_question() '(impacts SURF1 AND impacts SURF2)'
- Parameters:
predicates – an iterable of predicates to test
- static any(predicates: Iterable[VariantPredicate]) VariantPredicate [source]
Prepare a
VariantPredicate
that returns True if ANY of the predicates evaluates to True.This can be useful for building compound predicates programmatically.
Example
Build a predicate to test if variant leads to a missense or nonsense change on a fictional transcript NM_123456.7:
>>> from gpsea.model import VariantEffect >>> from gpsea.analysis.predicate.genotype import VariantPredicates
>>> tx_id = 'NM_123456.7' >>> effects = (VariantEffect.MISSENSE_VARIANT, VariantEffect.STOP_GAINED,) >>> predicate = VariantPredicates.any(VariantPredicates.variant_effect(e, tx_id) for e in effects) >>> predicate.get_question() '(MISSENSE_VARIANT on NM_123456.7 OR STOP_GAINED on NM_123456.7)'
- Parameters:
predicates – an iterable of predicates to test
- static variant_effect(effect: VariantEffect, tx_id: str) VariantPredicate [source]
Prepare a
VariantPredicate
to test if the functional annotation predicts the variant to lead to a certain variant effect.Example
Make a predicate for testing if the variant leads to a missense change on transcript NM_123.4:
>>> from gpsea.model import VariantEffect >>> from gpsea.analysis.predicate.genotype import VariantPredicates >>> predicate = VariantPredicates.variant_effect(VariantEffect.MISSENSE_VARIANT, tx_id='NM_123.4') >>> predicate.get_question() 'MISSENSE_VARIANT on NM_123.4'
- Parameters:
effect – the target
VariantEffect
tx_id – a str with the accession ID of the target transcript (e.g. NM_123.4)
- static variant_key(key: str) VariantPredicate [source]
Prepare a
VariantPredicate
that tests if the variant matches the provided key.- Parameters:
key – a str with the variant key (e.g. X_12345_12345_C_G or 22_10001_20000_INV)
- static gene(symbol: str) VariantPredicate [source]
Prepare a
VariantPredicate
that tests if the variant affects a given gene.- Parameters:
symbol – a str with the gene symbol (e.g.
'FBN1'
).
- static transcript(tx_id: str) VariantPredicate [source]
Prepare a
VariantPredicate
that tests if the variant affects a transcript.- Parameters:
tx_id – a str with the accession ID of the target transcript (e.g. NM_123.4)
- static exon(exon: int, tx_id: str) VariantPredicate [source]
Prepare a
VariantPredicate
that tests if the variant overlaps with an exon of a specific transcript.Warning
We use 1-based numbering to number the exons, not the usual 0-based numbering of the computer science. Therefore, the first exon of the transcript has
exon_number==1
, the second exon is2
, and so on …Warning
We do not check if the exon_number spans beyond the number of exons of the given transcript_id! Therefore,
exon_number==10,000
will effectively return False for all variants!!! 😱 Well, at least the genome variants of the Homo sapiens sapiens taxon…- Parameters:
exon – a non-negative int with the index of the target exon (e.g. 0 for the 1st exon, 1 for the 2nd, …)
tx_id – a str with the accession ID of the target transcript (e.g. NM_123.4)
- static region(region: Region, tx_id: str) VariantPredicate [source]
Prepare a
VariantPredicate
that tests if the variant overlaps with a region on a protein of a specific transcript.- Parameters:
region – a
Region
that gives the start and end coordinate of the region of interest on a protein strand.
- static is_large_imprecise_sv() VariantPredicate [source]
Prepare a
VariantPredicate
for testing if the variant is a large structural variant (SV) without exact breakpoint coordinates.
- static is_structural_variant(threshold: int = 50) VariantPredicate [source]
Prepare a
VariantPredicate
for testing if the variant is a structural variant (SV).SVs are usually defined as variant affecting more than a certain number of base pairs. The thresholds vary in the literature, but here we use 50bp as a default.
Any variant that affects at least threshold base pairs is considered an SV. Large SVs with unknown breakpoint coordinates or translocations (
VariantClass.BND
) are always considered as an SV.- Parameters:
threshold – a non-negative int with the number of base pairs that must be affected
- static structural_type(curie: str | TermId) VariantPredicate [source]
Prepare a
VariantPredicate
for testing if the variant has a certain structural type.We recommend using a descendant of structural_variant (SO:0001537) as the structural type.
Example
Make a predicate for testing if the variant is a chromosomal deletion (SO:1000029):
>>> from gpsea.analysis.predicate.genotype import VariantPredicates >>> predicate = VariantPredicates.structural_type('SO:1000029') >>> predicate.get_question() 'structural type is SO:1000029'
- Parameters:
curie – compact uniform resource identifier (CURIE) with the structural type to test.
- static variant_class(variant_class: VariantClass) VariantPredicate [source]
Prepare a
VariantPredicate
for testing if the variant is of a certainVariantClass
.Example
Make a predicate to test if the variant is a deletion:
>>> from gpsea.model import VariantClass >>> from gpsea.analysis.predicate.genotype import VariantPredicates >>> predicate = VariantPredicates.variant_class(VariantClass.DEL) >>> predicate.get_question() 'variant class is DEL'
- Parameters:
variant_class – the variant class to test.
- static ref_length(operator: Literal['<', '<=', '==', '!=', '>=', '>'], length: int) VariantPredicate [source]
Prepare a
VariantPredicate
for testing if the reference (REF) allele of variant is above, below, or (not) equal to certain length.See also
See Length of the reference allele for more info.
Example
Prepare a predicate that tests that the REF allele includes more than 5 base pairs:
>>> from gpsea.analysis.predicate.genotype import VariantPredicates >>> predicate = VariantPredicates.ref_length('>', 5) >>> predicate.get_question() 'ref allele length > 5'
- Parameters:
operator – a str with the desired test. Must be one of
{ '<', '<=', '==', '!=', '>=', '>' }
.length – a non-negative int with the length threshold.
- static change_length(operator: Literal['<', '<=', '==', '!=', '>=', '>'], threshold: int) VariantPredicate [source]
Prepare a
VariantPredicate
for testing if the variant’s change length is above, below, or (not) equal to certain threshold.See also
See Change length of an allele for more info.
Example
Make a predicate for testing if the change length is less than or equal to -10, e.g. to test if a variant is a deletion leading to removal of at least 10 base pairs:
>>> from gpsea.analysis.predicate.genotype import VariantPredicates >>> predicate = VariantPredicates.change_length('<=', -10) >>> predicate.get_question() 'change length <= -10'
- Parameters:
operator – a str with the desired test. Must be one of
{ '<', '<=', '==', '!=', '>=', '>' }
.threshold – an int with the threshold. Can be negative, zero, or positive.
- static is_structural_deletion(threshold: int = -50) VariantPredicate [source]
Prepare a
VariantPredicate
for testing if the variant is a chromosomal deletion or a structural variant deletion that leads to removal of at least n base pairs (50bp by default).Note
The predicate uses
change_length()
to determine if the length of the variant is above or below threshold.IMPORTANT: the change lengths of deletions are negative, since the alternate allele is shorter than the reference allele. See Change length of an allele for more info.
Example
Prepare a predicate for testing if the variant is a chromosomal deletion that removes at least 20 base pairs:
>>> from gpsea.analysis.predicate.genotype import VariantPredicates >>> predicate = VariantPredicates.is_structural_deletion(-20) >>> predicate.get_question() '(structural type is SO:1000029 OR (variant class is DEL AND change length <= -20))'
- Parameters:
threshold – an int with the change length threshold to determine if a variant is “structural” (-50 bp by default).
- class gpsea.analysis.predicate.genotype.ProteinPredicates(protein_metadata_service: ProteinMetadataService)[source]
Bases:
object
ProteinPredicates prepares variant predicates that need to consult
ProteinMetadataService
to categorize aVariant
.- protein_feature_type(feature_type: FeatureType, tx_id: str) VariantPredicate [source]
Prepare a
VariantPredicate
that tests if the variant affects a protein feature type.- Parameters:
feature_type – the target protein
FeatureType
(e.g.FeatureType.DOMAIN
)
- protein_feature(feature_id: str, tx_id: str) VariantPredicate [source]
Prepare a
VariantPredicate
that tests if the variant affects a protein feature type.- Parameters:
feature_id – the id of the target protein feature (e.g. ANK 1)
tx_id – a str with the accession ID of the target transcript (e.g. NM_123.4)