`.. _variant-category:

Group by variant category

Sometimes we want to compare the individuals who have the same allele count (AC) of variant categories \(A\) and \(B\). For example, in the context of an autosomal dominant disease, we may want to compare the individuals with \(AC_{A}=1\) (where \(A\) is e.g. a predicted loss-of-function mutation) with those harboring \(AC_{B}=1\) (where \(B\) is e.g. a missense mutation). Similarly, in an autosomal recessive disease, we may be interested in comparing the individuals with \(AC_{A} \ge 1\) with those with \(AC_{A} = 0\). In both analyses, we compare two variant categories \(A\) and \(B\) which are described by a VariantPredicate (see Variant Predicates section), while ensuring the allele count sum of both variant categories is \(k\).

\(k = \sum_{i \in \{A, B\}} AC_{i}\)

GPSEA provides two predicates:

\(k\)

Name

Function

1

Monoallelic predicate

monoallelic_predicate()

2

Biallelic predicate

biallelic_predicate()

Monoallelic predicate

Monoallelic predicate compares individuals who have one allele of a variants of interest. The predicate needs two variant predicates A and B to compute the allele counts \(AC_{A}\) and \(AC_{B}\), in order to assign an individual into one of the following genotype groups:

Monoallelic predicate genotype groups

Genotype group

\(AC_{A}\)

\(AC_{B}\)

A

1

0

B

0

1

None

other

other

The individuals with \(\sum_{i \in \{A, B\}} AC_{i} \neq 1\) are omitted from the analysis.

Example

Let’s create a predicate to categorize the individuals to those having one missense allele or to those having one frameshift allele with respect to fictional transcript NM_1234.5.

>>> tx_id = "NM_1234.5"

We start by creating the variant predicates for missense (A) and frameshift (B) variants:

>>> from gpsea.model import VariantEffect
>>> from gpsea.analysis.predicate.genotype import VariantPredicates
>>> is_missense = VariantPredicates.variant_effect(effect=VariantEffect.MISSENSE_VARIANT, tx_id=tx_id)
>>> is_frameshift = VariantPredicates.variant_effect(effect=VariantEffect.FRAMESHIFT_VARIANT, tx_id=tx_id)

Monoallelic predicate lets us customize the category names. Let’s use Missense and Frameshift instead of the defaults:

>>> a_label = "Missense"
>>> b_label = "Frameshift"

Now we have all we need to create the predicate:

>>> from gpsea.analysis.predicate.genotype import monoallelic_predicate
>>> gt_predicate = monoallelic_predicate(
...     a_predicate=is_missense,
...     b_predicate=is_frameshift,
...     a_label=a_label, b_label=b_label,
... )
>>> gt_predicate.group_labels
('Missense', 'Frameshift')

Biallelic predicate

Biallelic predicate compares the individuals with two alleles of the variants of interest. The functionality is very similar to that of monoallelic predicate, with two differences.

Categories

Biallelic locus can be present in one of three genotypes, allowing an individual to be assigned into one of the three genotype groups:

Biallelic predicate genotype groups

Index

Genotype group

\(AC_{A}\)

\(AC_{B}\)

0

A/A

2

0

1

A/B

1

1

2

B/B

0

2

None

other

other

Note that \(\sum_{i \in \{A, B\}} AC_{i} = 2\) and the individuals with a different allele count sum are omitted from the analysis.

Example

Let A and B correspond to MISSENSE and FRAMESHIFT variants, and let’s reuse the variant predicates is_missense and is_frameshift from the previous section, to compare missense and frameshift variants in the context of an autosomal recessive disease.

>>> from gpsea.analysis.predicate.genotype import biallelic_predicate
>>> gt_predicate = biallelic_predicate(
...     a_predicate=is_missense,
...     b_predicate=is_frameshift,
...     a_label="Missense", b_label="Frameshift",
... )
>>> gt_predicate.group_labels
('Missense/Missense', 'Missense/Frameshift', 'Frameshift/Frameshift')

The predicate will assign the individuals into one of three genotype groups:

  • Missense/Missense - two missense alleles

  • Missense/Frameshift - one missense allele and one frameshift allele

  • Frameshift/Frameshift - two frameshift alleles

Partitions

Sometimes we are interested in lumping several genotype groups into a partition and then comparing the partitions. For instance, in the context of an autosomal recessive disease, we may want to compare individuals with two “mild” mutations with the individuals with at least one “severe” mutation. This comparison can be implemented using the partitions option.

We define a partition as a set of one or more genotype group indices (see Biallelic predicate genotype groups table), and we must provide at least two such partitions to the partitions option.

Example

Let A and B correspond to MISSENSE and FRAMESHIFT variant Here we compare the individuals with two missense alleles with the individuals with one frameshift and one missense alleles, or with two frameshift alelles.

The partition for the two missense alleles will include the genotype group 0, and the one or more frameshift alleles partition corresponds to the genotype groups {1, 2} (see Biallelic predicate genotype groups table). The complete partitions are defined as:

>>> partitions = ({0,}, {1, 2})

We provide partitions to the biallelic_predicate() function:

>>> gt_predicate = biallelic_predicate(
...     a_predicate=is_missense,
...     b_predicate=is_frameshift,
...     a_label="Missense", b_label="Frameshift",
...     partitions=partitions,
... )
>>> gt_predicate.group_labels
('Missense/Missense', 'Missense/Frameshift OR Frameshift/Frameshift')

Now gt_predicate assigns an individual into one of the two categories:

  • Missense/Missense

  • Missense/Frameshift OR Frameshift/Frameshift