Group by allele count
Sometimes, we may want to compare individuals with different allele counts of a single variant category. For instance, we may want to compare the survival of individuals harboring a mutation in EGFR (\(AC = 1\)) with those with no such mutation (\(AC = 0\)). Alternatively, in some genes, heterozygous mutations (\(AC = 1\)) and biallelic mutations (\(AC = 2\)) can lead to different diseases.
The allele count analysis differs from the variant-category analysis. The allele count analysis partitions the individuals based on different allele counts of one genotype category, while the variant category analysis partitions the individuals based on a fixed allele count of different genotype categories.
The allele count analysis can be done with allele_count()
predicate.
Examples
Compare the individuals with EGFR mutation
First, let’s create a VariantPredicate
to include
any EGFR variant:
>>> from gpsea.analysis.predicate.genotype import VariantPredicates
>>> affects_egfr = VariantPredicates.gene(symbol="EGFR")
>>> affects_egfr.description
'affects EGFR'
Next, we create allele count predicate to partition the individuals based on presence of zero or one EGFR mutation allele:
>>> from gpsea.analysis.predicate.genotype import allele_count
>>> gt_predicate = allele_count(
... counts=({0,}, {1,}),
... target=affects_egfr,
... )
>>> gt_predicate.group_labels
('0', '1')
We create the predicate with two arguments.
The counts argument takes a tuple of two sets, to partition the individuals
based on zero ({0,}
) or one ({1,}
) target variant allele.
The target takes a VariantPredicate
for defining the target variants.
We can use the gt_predicate to partition a cohort along the genotype axis, e.g. to compare the patient survivals in a survival analysis <survival>.
Compare the individuals with monoallelic and biallelic mutations
Let’s prepare a predicate for grouping individuals based on one or two alleles of a target mutation.
For this example, the target mutation is any mutation that affects LMNA
>>> from gpsea.analysis.predicate.genotype import VariantPredicates
>>> affects_lmna = VariantPredicates.gene(symbol="LMNA")
>>> affects_lmna.description
'affects LMNA'
and we will compare the individuals with one allele with those with two alleles:
>>> gt_predicate = allele_count(
... counts=({1,}, {2,}),
... target=affects_lmna,
... )
>>> gt_predicate.group_labels
('1', '2')
The predicate will partition the individuals into two groups: those with one LMNA variant allele and those with two LMNA variant alleles. The individual with other allele counts (e.g. 0 or 3) will be excluded from the analysis.