gpsea.model package
The gpsea.model package defines data model classes used in GPSEA.
We start with the top-level elements, such as Cohort
and Patient
,
and we follow with data classes for phenotype, genotype, transcript, and protein info.
- class gpsea.model.Cohort(members: Iterable[Patient], excluded_member_count: int)[source]
Bases:
Sized
,Iterable
[Patient
]- static from_patients(members: Iterable[Patient], include_patients_with_no_HPO: bool = False, include_patients_with_no_variants: bool = False)[source]
Create a cohort from a sequence of patients.
- property all_patients: Collection[Patient]
Get a collection of all patients in the cohort.
- all_phenotypes() Set[Phenotype] [source]
Get a set of all phenotypes (observed or excluded) in the cohort members.
- all_diseases() Set[Disease] [source]
Get a set of all diseases (observed or excluded) in the cohort members.
- property all_transcript_ids: Set[str]
Get a set of all transcript IDs affected by the cohort variants.
- property total_patient_count
Get the total number of cohort members.
- list_present_phenotypes(top: int | None = None) Sequence[Tuple[str, int]] [source]
Get a sequence with counts of HPO terms used as direct annotations of the cohort members.
- list_all_variants(top=None) Sequence[Tuple[str, int]] [source]
- Parameters:
typing.Optional[int] (top) – If not given, lists all variants. Otherwise, lists only the top highest counts
- Returns:
A sequence of tuples, formatted (variant key, number of patients with that variant)
- Return type:
- list_all_proteins(top=None) Sequence[Tuple[str, int]] [source]
- Parameters:
typing.Optional[int] (top) – If not given, lists all proteins. Otherwise, lists only the top highest counts.
- Returns:
A list of tuples, formatted (protein ID string, the count of variants that affect the protein)
- Return type:
- variant_effect_count_by_tx(tx_id: str | None = None) Mapping[str, Mapping[str, int]] [source]
Count variant effects for all transcripts or for a transcript tx_id of choice.
- Parameters:
tx_id – a str with transcript accession (e.g. NM_123456.5) or None if all transcripts should be listed.
- Returns:
- Each transcript ID references a Counter(), with the variant effect as the key
and the count of variants with that effect on the transcript id.
- Return type:
mapping
- class gpsea.model.Patient(labels: SampleLabels, sex: Sex, phenotypes: Iterable[Phenotype], diseases: Iterable[Disease], variants: Iterable[Variant])[source]
Bases:
object
Patient represents a single investigated individual.
Note
We strongly recommend using the
from_raw_parts()
static constructor instead of __init__.- static from_raw_parts(labels: SampleLabels, sex: Sex | None, phenotypes: Iterable[Phenotype], diseases: Iterable[Disease], variants: Iterable[Variant]) Patient [source]
Create Patient from the primary data.
- property labels: SampleLabels
Get the sample identifiers.
- present_phenotypes() Iterator[Phenotype] [source]
Get an iterator over the present phenotypes of the patient.
- excluded_phenotypes() Iterator[Phenotype] [source]
Get an iterator over the excluded phenotypes of the patient.
- class gpsea.model.SampleLabels(label: str, meta_label: str | None = None)[source]
Bases:
object
A data model for subject identifiers.
The subject has a mandatory
label
and an optionalmeta_label
.The identifiers support natural ordering, equality tests, and are hashable.
- class gpsea.model.Sex(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
Enum
Sex represents typical “phenotypic sex”, as would be determined by a midwife or physician at birth.
The definition is aligned with Phenopacket Schema
- UNKNOWN_SEX = 0
Not assessed or not available. Maps to
NCIT:C17998
.
- FEMALE = 1
Female sex. Maps to
NCIT:C46113
.
- MALE = 2
Male sex. Maps to
NCIT:C46112
.
- class gpsea.model.Phenotype(term_id: TermId, is_observed: bool)[source]
Bases:
Identified
,ObservableFeature
Phenotype represents a clinical sign or symptom represented as an HPO term.
The phenotype can be either present in the patient or excluded.
- static from_term(term: MinimalTerm, is_observed: bool)[source]
- static from_raw_parts(term_id: str | TermId, is_observed: bool) Phenotype [source]
Create Phenotype from a term ID and observation state.
- Parameters:
term_id – a str with CURIE (e.g. HP:0001250) or a
TermId
.is_observed – True if the term ID was observed in patient or False if it was explicitly excluded.
- property is_present: bool
Return True if the phenotype feature was observed in the subject or False if the feature’s presence was explicitly excluded.
- class gpsea.model.Disease(term_id: TermId, name: str, is_observed: bool)[source]
Bases:
Identified
,ObservableFeature
,Named
Representation of a disease diagnosed (or excluded) in an investigated individual.
- property name
Get the disease label (e.g. LEIGH SYNDROME, NUCLEAR; NULS).
- class gpsea.model.Variant(variant_info: VariantInfo, tx_annotations: Iterable[TranscriptAnnotation], genotypes: Genotypes)[source]
Bases:
VariantInfoAware
,FunctionalAnnotationAware
,Genotyped
- Variant includes three lines of information:
the variant data with coordinates or other info available for large imprecise SVs,
results of the functional annotation with respect to relevant transcripts, and
the genotypes for the known samples
- static create_variant_from_scratch(variant_coordinates: VariantCoordinates, gene_name: str, trans_id: str, hgvs_cdna: str, is_preferred: bool, consequences: Iterable[VariantEffect], exons_effected: Sequence[int], protein_id: str | None, hgvsp: str | None, protein_effect_start: int | None, protein_effect_end: int | None, genotypes: Genotypes)[source]
- property variant_info: VariantInfo
Get the representation of the variant data for sequence and symbolic variants, as well as for large imprecise SVs.
- property tx_annotations: Sequence[TranscriptAnnotation]
A collection of TranscriptAnnotations that each represent results of the functional annotation of a variant with respect to single transcript of a gene.
- Returns:
A sequence of TranscriptAnnotation objects
- Return type:
Sequence[TranscriptAnnotation]
- class gpsea.model.VariantClass(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
Enum
VariantClass represents a high-level variant category which mostly corresponds to the structural variant categories of the Variant Call Format specification, but includes type for single nucleotide variants (SNV) and multi-nucleotide variant (MNV).
- DEL = 0
A deletion - a variant with a net loss of sequence from the alternative allele regardless of its size.
Both a deletion of 1 base pair and a deletion of 1000 base pairs are acceptable.
- DUP = 1
Duplication (tandem or interspersed).
- INS = 2
Insertion of a novel sequence.
- INV = 3
Inversion of a chromosome segment.
- MNV = 4
Multi-nucleotide variant (e.g. AG>CT) that is not a duplication, deletion, or insertion. May be called INDEL.
- SNV = 5
Single nucleotide variant.
- BND = 6
A breakend.
- class gpsea.model.VariantCoordinates(region: GenomicRegion, ref: str, alt: str, change_length: int)[source]
Bases:
object
A representation of coordinates of sequence and symbolic variants.
Note, the breakend variants are not currently supported.
- static from_vcf_literal(contig: Contig, pos: int, ref: str, alt: str)[source]
Create VariantCoordinates from a variant in VCF literal notation.
Note, this function must not be used to create a VCF symbolic variant (e.g. <DEL> or translocation). Use
from_vcf_symbolic()
instead.Example
Create a variant from a VCF line:
` #CHROM POS ID REF ALT ... chr1 1001 . C T `
We first must decide on the genome build. Most of the time, we should use GRCh38:
>>> from gpsea.model.genome import GRCh38 >>> build = GRCh38
Then, we access the contig for
'chr1'
:>>> chr1 = build.contig_by_name('chr1')
Last, we create the variant coordinates:
>>> from gpsea.model import VariantCoordinates >>> vc = VariantCoordinates.from_vcf_literal( ... contig=chr1, pos=1001, ref='C', alt='T', ... )
Now can test the properties:
>>> vc.start, vc.end, vc.ref, vc.alt, len(vc), vc.change_length (1000, 1001, 'C', 'T', 1, 0)
- Parameters:
contig – a
Contig
for the chromosomepos – a 1-based coordinate of the first base of the reference allele, as described in VCF standard
ref – a str with the REF allele. Should meet the requirements of the VCF standard.
alt – a str with the ALT allele. Should meet the requirements of the VCF standard.
- static from_vcf_symbolic(contig: Contig, pos: int, end: int, ref: str, alt: str, svlen: int)[source]
Create VariantCoordinates from a variant in VCF symbolic notation.
Note, this function must not be used to create a VCF sequence/literal variant. Use
from_vcf_literal()
instead.Example
Let’s create a symbolic variant from the line:
` #CHROM POS ID REF ALT QUAL FILTER INFO 2 321682 . T <DEL> 6 PASS SVTYPE=DEL;END=321887;SVLEN=-205 `
We first must decide on the genome build. Most of the time, we should use GRCh38:
>>> from gpsea.model.genome import GRCh38 >>> contig = GRCh38.contig_by_name('2')
Now, we create the coordinates as:
>>> vc = VariantCoordinates.from_vcf_symbolic( ... contig=contig, pos=321682, end=321887, ... ref='T', alt='<DEL>', svlen=-205, ... )
Now can test the properties:
>>> vc.start, vc.end, vc.ref, vc.alt, len(vc), vc.change_length (321681, 321887, 'T', '<DEL>', 206, -205)
- Parameters:
contig – a
Contig
for the chromosomepos – a 1-based coordinate of the first base of the affected reference allele region
end – a 1-based coordinate of the last base of the affected reference allele region
ref – a str with the REF allele. Most of the time, it is one of {‘N’, ‘A’, ‘C’, ‘G’, ‘T’}
alt – a str with the ALT allele, e.g. one of {‘<DEL>’, ‘<DUP>’, ‘<INS>’, ‘<INV>’}
svlen – an int with change length (the difference between ref and alt allele lengths)
- property start: int
Get the 0-based start coordinate (excluded) of the first base of the
ref
allele.
- property region: GenomicRegion
Get the genomic region spanned by the
ref
allele.
- property ref: str
Get the reference allele (e.g. “A”, “CCT”, “N”). The allele may be an empty string.
- property alt: str
Get the alternate allele (e.g. “A”, “GG”, “<DEL>”).
The allele may be an empty string for sequence variants. The symbolic alternate allele follow the VCF notation and use the < and > characters (e.g. “<DEL>”, “<INS:ME:SINE>”).
- property change_length: int
Get the change of length between the ref and alt alleles due to the variant presence.
See Change length of an allele for more info.
- property variant_key: str
Get a readable representation of the variant’s coordinates.
For instance,
X_12345_12345_C_G
for a sequence variant or22_10001_20000_INV
for a symbolic variant. If the key is larger than 50 characters, the ‘ref’ and/or ‘alt’ (if over 10 bps) are changed to just show number of bps. Example:X_1000001_1000027_TAAAAAAAAAAAAAAAAAAAAAAAAAA_T
->X_1000001_1000027_--27bp--_T
Note
Both start and end coordinates use 1-based (included) coordinate system.
- property variant_class: VariantClass
Get a
VariantClass
category.
- is_structural() bool [source]
Checks if the variant coordinates use structural variant notation as described by Variant Call Format (VCF).
Ane example of structural variant notation:
chr5 101 . N <DEL> . . SVTYPE=DEL;END=120;SVLEN=-10
as opposed to the sequence (literal) notation:
chr5 101 . NACGTACGTAC N
- Returns:
True if the variant coordinates use structural variant notation.
- class gpsea.model.ImpreciseSvInfo(structural_type: TermId, variant_class: VariantClass, gene_id: str, gene_symbol: str)[source]
Bases:
object
Data regarding a structural variant (SV) with imprecise breakpoint coordinates.
- property structural_type: TermId
Get term ID of the structural type (e.g.
SO:1000029
for chromosomal deletion).
- property variant_class: VariantClass
Get a
VariantClass
category.
- property gene_id: str
Get a str with gene identifier CURIE (e.g.
HGNC:3603
) or None if the identifier is not available.
- class gpsea.model.VariantInfo(variant_coordinates: VariantCoordinates | None = None, sv_info: ImpreciseSvInfo | None = None)[source]
Bases:
object
VariantInfo consists of either variant coordinates or imprecise SV data.
The class is conceptually similar to Rust enum - only one of the fields can be set at any point in time.
- property variant_coordinates: VariantCoordinates | None
Get variant coordinates if available.
- property sv_info: ImpreciseSvInfo | None
Get information about large imprecise SV.
- has_sv_info() bool [source]
Returns True if the variant is a large imprecise SV and the exact coordinates are thus unavailable.
- property variant_key: str
Get a readable representation of the variant’s coordinates or the large SV info.
- property variant_class: VariantClass
Get a
VariantClass
category.
- is_structural() bool [source]
Test if the variant is a structural variant.
This can either be because the variant coordinates use the structural variant notation (see
VariantCoordinates.is_structural()
) or if the variant is large imprecise SV.
- class gpsea.model.VariantInfoAware[source]
Bases:
object
An entity where
VariantInfo
is available.- abstract property variant_info: VariantInfo
Get the variant data with coordinates or other info available for large imprecise SVs.
- class gpsea.model.Genotype(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
Enum
Genotype represents state of a variable locus in a diploid genome.
- NO_CALL = ('.',)
- HOMOZYGOUS_REFERENCE = ('0/0',)
- HETEROZYGOUS = ('0/1',)
- HOMOZYGOUS_ALTERNATE = ('1/1',)
- HEMIZYGOUS = ('1',)
- class gpsea.model.Genotypes(samples: Iterable[SampleLabels], genotypes: Iterable[Genotype])[source]
-
Genotypes is a container for mapping between sample ID and its genotype.
Let’s consider a pair of samples:
>>> a = SampleLabels('A') >>> b = SampleLabels('B')
We can use one of the static methods to create an instance. Either a single genotype:
>>> gt = Genotypes.single(a, Genotype.HETEROZYGOUS)
or genotypes of several samples:
>>> gts = Genotypes.from_mapping({a: Genotype.HETEROZYGOUS, b: Genotype.HOMOZYGOUS_ALTERNATE})
There are 2 genotypes in the container:
>>> len(gts) 2
You can get a genotype for a sample ID:
>>> g = gts.for_sample(a) >>> g.code '0/1'
You will get None if the sample is not present:
>>> gts.for_sample(SampleLabels('UNKNOWN'))
You can iterate over sample-genotype pairs:
>>> for sample_id, genotype in gts: ... print(sample_id, genotype) A 0/1 B 1/1
- static single(sample_id: SampleLabels, genotype: Genotype)[source]
A shortcut for creating Genotypes for a single sample:
>>> a = SampleLabels('A') >>> gts = Genotypes.single(a, Genotype.HOMOZYGOUS_ALTERNATE)
>>> assert len(gts) == 1 >>> assert gts.for_sample(a) == Genotype.HOMOZYGOUS_ALTERNATE
- static from_mapping(mapping: Mapping[SampleLabels, Genotype])[source]
Create Genotypes from mapping between sample IDs and genotypes.
>>> a = SampleLabels('A') >>> b = SampleLabels('B') >>> gts = Genotypes.from_mapping({a: Genotype.HETEROZYGOUS, b: Genotype.HOMOZYGOUS_ALTERNATE})
>>> assert len(gts) == 2
- for_sample(sample_id: SampleLabels) Genotype | None [source]
Get a genotype for a sample or None if the genotype is not present.
- Parameters:
sample_id – a
SampleLabels
with sample’s identifier.
- class gpsea.model.Genotyped[source]
Bases:
object
Genotyped entities
- genotype_for_sample(sample_id: SampleLabels) Genotype | None [source]
Get a genotype for a sample or None if the genotype is not present.
- Parameters:
sample_id – a
SampleLabels
with sample’s identifier.
- class gpsea.model.TranscriptAnnotation(gene_id: str, tx_id: str, hgvs_cdna: str | None, is_preferred: bool, variant_effects: Iterable[VariantEffect], affected_exons: Iterable[int] | None, protein_id: str | None, hgvsp: str | None, protein_effect_coordinates: Region | None)[source]
Bases:
TranscriptInfoAware
TranscriptAnnotation represent a result of the functional annotation of a variant with respect to single transcript of a gene.
- property is_preferred: bool
Return True if the transcript is the preferred transcript of a gene, such as MANE transcript, canonical Ensembl transcript.
- property hgvs_cdna: str | None
Get the HGVS description of the sequence variant (e.g.
NM_123456.7:c.9876G>T
) or None if not available.
- property variant_effects: Sequence[VariantEffect]
Get a sequence of the predicted functional variant effects.
- property overlapping_exons: Sequence[int] | None
Get a sequence of 1-based exon indices (the index of the 1st exon is 1) that overlap with the variant.
- property protein_id: str | None
Get the ID of the protein encoded by the
transcript_id
or None if the transcript is not protein-coding.
- class gpsea.model.VariantEffect(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
Enum
VariantEffect represents consequences of a variant on transcript that are supported by GPSEA.
>>> from gpsea.model import VariantEffect >>> missense = VariantEffect.MISSENSE_VARIANT >>> print(missense) missense_variant
The VariantEffect has a
curie
attribute that represents the ontology class from Sequence Ontology.>>> missense.curie 'SO:0001583'
- TRANSCRIPT_ABLATION = 'SO:0001893'
- SPLICE_ACCEPTOR_VARIANT = 'SO:0001574'
- SPLICE_DONOR_VARIANT = 'SO:0001575'
- STOP_GAINED = 'SO:0001587'
- FRAMESHIFT_VARIANT = 'SO:0001589'
- STOP_LOST = 'SO:0001578'
- START_LOST = 'SO:0002012'
- TRANSCRIPT_AMPLIFICATION = 'SO:0001889'
- INFRAME_INSERTION = 'SO:0001821'
- INFRAME_DELETION = 'SO:0001822'
- MISSENSE_VARIANT = 'SO:0001583'
- PROTEIN_ALTERING_VARIANT = 'SO:0001818'
- SPLICE_REGION_VARIANT = 'SO:0001630'
- SPLICE_DONOR_5TH_BASE_VARIANT = 'SO:0001787'
- SPLICE_DONOR_REGION_VARIANT = 'SO:0002170'
- SPLICE_POLYPYRIMIDINE_TRACT_VARIANT = 'SO:0002169'
- INCOMPLETE_TERMINAL_CODON_VARIANT = 'SO:0001626'
- START_RETAINED_VARIANT = 'SO:0002019'
- STOP_RETAINED_VARIANT = 'SO:0001567'
- SYNONYMOUS_VARIANT = 'SO:0001819'
- CODING_SEQUENCE_VARIANT = 'SO:0001580'
- MATURE_MIRNA_VARIANT = 'SO:0001620'
- FIVE_PRIME_UTR_VARIANT = 'SO:0001623'
- THREE_PRIME_UTR_VARIANT = 'SO:0001624'
- NON_CODING_TRANSCRIPT_EXON_VARIANT = 'SO:0001792'
- INTRON_VARIANT = 'SO:0001627'
- NMD_TRANSCRIPT_VARIANT = 'SO:0001621'
- NON_CODING_TRANSCRIPT_VARIANT = 'SO:0001619'
- UPSTREAM_GENE_VARIANT = 'SO:0001631'
- DOWNSTREAM_GENE_VARIANT = 'SO:0001632'
- TFBS_ABLATION = 'SO:0001895'
- TFBS_AMPLIFICATION = 'SO:0001892'
- TF_BINDING_SITE_VARIANT = 'SO:0001782'
- REGULATORY_REGION_ABLATION = 'SO:0001894'
- REGULATORY_REGION_AMPLIFICATION = 'SO:0001891'
- FEATURE_ELONGATION = 'SO:0001907'
- REGULATORY_REGION_VARIANT = 'SO:0001566'
- FEATURE_TRUNCATION = 'SO:0001906'
- INTERGENIC_VARIANT = 'SO:0001628'
- SEQUENCE_VARIANT = 'SO:0001060'
- to_display() str [source]
Get a concise name of the variant effect that is suitable for showing to humans.
Example
>>> from gpsea.model import VariantEffect >>> VariantEffect.MISSENSE_VARIANT.to_display() 'missense' >>> VariantEffect.SPLICE_DONOR_5TH_BASE_VARIANT.to_display() 'splice donor 5th base'
- Returns:
a str with the name or ‘n/a’ if the variant effect was not assigned a concise name.
- static structural_so_id_to_display(so_term: TermId | str) str [source]
Get a str with a concise name for representing a Sequence Ontology (SO) term identifier.
Example
>>> from gpsea.model import VariantEffect >>> VariantEffect.structural_so_id_to_display('SO:1000029') 'chromosomal deletion'
- Parameters:
so_term – a CURIE str or a
TermId
with the query SO term.- Returns:
a str with the concise name for the SO term or ‘n/a’ if a name has not been assigned yet.
- class gpsea.model.TranscriptInfoAware[source]
Bases:
object
The implementors know about basic gene/transcript identifiers.
- class gpsea.model.TranscriptCoordinates(identifier: str, region: GenomicRegion, exons: Iterable[GenomicRegion], cds_start: int | None, cds_end: int | None, is_preferred: bool | None = None)[source]
Bases:
object
TranscriptCoordinates knows about genomic region of the transcript, exonic/intronic regions, as well as the coding and non-coding regions.
If both CDS coordinates are None, then the transcript coordinates are assumed to represent a non-coding transcript.
- property region: GenomicRegion
Get the genomic region spanned by the transcript, corresponding to 5’UTR, exonic, intronic, and 3’UTR regions.
- property exons: Sequence[GenomicRegion]
Get the exon regions.
- property cds_start: int | None
Get the 0-based (excluded) start coordinate of the first base of the start codon of the transcript or None if the transcript is not coding.
- property cds_end: int | None
Get the 0-based (included) end coordinate of the last base of the termination codon of the transcript or None if the transcript is not coding.
- get_coding_base_count() int | None [source]
Calculate the number of coding bases present in the transcript. Note, the count does NOT include the termination codon since it does not code for an aminoacid. Returns: an int with the coding base count or None if the transcript is non-coding.
- get_codon_count() int | None [source]
Calculate the count of codons present in the transcript. Note, the count does NOT include the termination codon!
Returns: the number of codons of the transcript or None if the transcript is non-coding.
- get_five_prime_utrs() Sequence[GenomicRegion] [source]
Get a sequence of genomic regions that correspond to 5’ untranslated regions of the transcript.
Returns: a sequence of genomic regions, an empty sequence if the transcript is non-coding.
- get_three_prime_utrs() Sequence[GenomicRegion] [source]
Get a sequence of genomic regions that correspond to 3’ untranslated regions of the transcript.
Note, the termination codon is NOT included in the regions!
Returns: a sequence of genomic regions, an empty sequence if the transcript is non-coding.
- get_cds_regions() Sequence[GenomicRegion] [source]
Get a sequence of genomic regions that correspond to coding regions of the transcript, including BOTH the initiation and termination codons.
Returns: a sequence of genomic regions, an empty sequence if the transcript is non-coding.
- class gpsea.model.ProteinMetadata(protein_id: str, label: str, protein_features: Sequence[ProteinFeature], protein_length: int = 0)[source]
Bases:
object
An info regarding a protein sequence, including an ID, a label, and location of protein features, such as motifs, domains, or other regions.
- property protein_features: Sequence[ProteinFeature]
Returns: Sequence[ProteinFeature]: A sequence of ProteinFeatures objects
- domains() Iterable[ProteinFeature] [source]
- Returns:
A subgroup of protein_features, where the ProteinFeature object has a FeatureType equal to “DOMAIN”
- Return type:
Iterable[ProteinFeature]
- repeats() Iterable[ProteinFeature] [source]
- Returns:
A subgroup of protein_features, where the ProteinFeature object has a FeatureType equal to “REPEAT”
- Return type:
Iterable[ProteinFeature]
- regions() Iterable[ProteinFeature] [source]
- Returns:
A subgroup of protein_features, where the ProteinFeature object has a FeatureType equal to “REGIONS”
- Return type:
Iterable[ProteinFeature]
- motifs() Iterable[ProteinFeature] [source]
- Returns:
A subgroup of protein_features, where the ProteinFeature object has a FeatureType equal to “MOTIF”
- Return type:
Iterable[ProteinFeature]
- get_features_variant_overlaps(region: Region) Collection[ProteinFeature] [source]
Get a collection of protein features that overlap with the region. :param region: the query region.
- Returns:
a collection of overlapping protein features.
- Return type:
Collection[ProteinFeature]
- class gpsea.model.ProteinFeature[source]
Bases:
object
- static create(info: FeatureInfo, feature_type: FeatureType)[source]
- abstract property info: FeatureInfo
- abstract property feature_type: FeatureType
- class gpsea.model.FeatureInfo(name: str, region: Region)[source]
Bases:
object
FeatureInfo represents a protein feature (e.g. a repeated sequence given the name “ANK 1” in protein “Ankyrin repeat domain-containing protein 11”)
- class gpsea.model.FeatureType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
Enum
An enum representing the protein feature types supported in GPSEA.
- REPEAT = 1
A repeated sequence motif or repeated domain within the protein.
- MOTIF = 2
A short (usually not more than 20 amino acids) conserved sequence motif of biological significance.
- DOMAIN = 3
A specific combination of secondary structures organized into a characteristic three-dimensional structure or fold.
- REGION = 4
A region of interest that cannot be described in other subsections.