GPSEA

The concept of phenotype denotes the observable attributes of an individual, but in medical contexts, the word “phenotype” is used to refer to some deviation from normal morphology, physiology, or behavior (c.f. Deep phenotyping for precision medicine). A key question in biology and human genetics concerns the relationships between phenotypic abnormalities and genotype. In Mendelian genetics, the focus is generally placed on the study of whether specific disease-causing alleles are associated with specific phenotypic manifestations of the disease.

GPSEA (Genotypes and Phenotypes - Statistical Evaluation of Associations) is a Python package for finding genotype-phenotype associations. The input to GPSEA is a collection of Global Alliance for Genomics and Health (GA4GH) Phenopackets. GPSEA ingests the phenopackets and analyzes the genotype-phenotype associations. The genotype can include variant types (e.g., missense vs. premature termination codon), or variant location in protein motifs or other features. Phenotype can be represented by Human Phenotype Ontology (HPO) terms, but using other phenotypes is possible. Statistical analysis is performed using e.g Fisher Exact Test.

GPSEA integrates with modern interactive computing environments, such as Jupyter notebook. Therefore, we recommend to install GPSEA into a Jupyter kernel and to perform the analyses in a Jupyter notebooks. The documentation is structured such that each section can be executed in a notebook, and we encourage running the examples while reading the documentation.

The documentation includes Setup instructions, a Tutorial with an end-to-end example, and a comprehensive User guide. The technical information is available in the API reference.

Literature

We provide recommended reading for background on the study of genotype-phenotype correlations.

Feedback

The best place to leave feedback, ask questions, and report bugs is the GPSEA Issue Tracker.