CDA Mutation
We extract information about variants from the CDA table mutation.
We first summarize the table and then outline our ETL strategy.
| Column | Example | Explanation |
|---|---|---|
| project_short_name | TCGA-CESC | |
| case_barcode | TCGA-C5-A1MI | |
| cda_subject_id | TCGA.TCGA-C5-A1MI | |
| primary_site | Cervix uteri | |
| Hugo_Symbol | IGSF9B | |
| Entrez_Gene_Id | 22997 | |
| Center | BI | |
| NCBI_Build | GRCh38 | |
| Chromosome | chr11 | |
| Start_Position | 133921225 | |
| End_Position | 133921225 | |
| Strand | + | |
| Variant_Classification | Missense_Mutation | |
| Variant_Type | SNP | |
| Reference_Allele | C | |
| Tumor_Seq_Allele1 | C | |
| Tumor_Seq_Allele2 | T | |
| dbSNP_RS | rs771150072 | |
| dbSNP_Val_Status | ||
| Tumor_Aliquot_Barcode | TCGA-C5-A1MI-01A-11D-A14W-08 | |
| Matched_Norm_Aliquot_Barcode | TCGA-C5-A1MI-10A-01D-A14W-08 | |
| Match_Norm_Seq_Allele1 | ||
| Match_Norm_Seq_Allele2 | ||
| Tumor_Validation_Allele1 | ||
| Tumor_Validation_Allele2 | ||
| Match_Norm_Validation_Allele1 | ||
| Match_Norm_Validation_Allele2 | ||
| Verification_Status | ||
| Validation_Status | ||
| Mutation_Status | Somatic | |
| Sequencing_Phase | ||
| Sequence_Source | ||
| Validation_Method | ||
| Score | ||
| BAM_File | ||
| Sequencer | ||
| Tumor_Aliquot_UUID | 497c20f0-8a42-4d20-abdc-0415982ebb9f | |
| Matched_Norm_Aliquot_UUID | a3d0503b-baac-4f83-9182-be7b4154c61d | |
| HGVSc | c.2500G>A | |
| HGVSp | p.Val834Met | |
| HGVSp_Short | p.V834M | |
| Transcript_ID | ENST00000321016 | |
| Exon_Number | 18/19 | |
| t_depth | 64 | |
| t_ref_count | 43 | |
| t_alt_count | 20 | |
| n_depth | 88 | |
| n_ref_count | ||
| n_alt_count | ||
| all_effects | IGSF9B,missense_variant,p.V834M,ENST00000533871,NM_001277285.4,c.2500G>A,MODERATE,YES,deleterious(0),probably_damaging(1),-1;IGSF9B,missense_variant,p.V834M,ENST00000321016,,c.2500G>A,MODERATE,,deleterious(0.01),probably_damaging(0.988),-1;IGSF9B,downstream_gene_variant,,ENST00000527648,,,MODIFIER,,,,-1 | |
| Allele | T | |
| Gene | ENSG00000080854 | |
| Feature | ENST00000321016 | |
| Feature_type | Transcript | |
| One_Consequence | missense_variant | |
| Consequence | missense_variant | |
| cDNA_position | 2500/4050 | |
| Protein_position | 834/1349 | |
| Amino_acids | V/M | |
| Codons | Gtg/Atg | |
| Existing_variation | rs771150072;COSV58068494 | |
| DISTANCE | ||
| TRANSCRIPT_STRAND | -1 | |
| SYMBOL | IGSF9B | |
| SYMBOL_SOURCE | HGNC | |
| HGNC_ID | HGNC:32326 | |
| BIOTYPE | protein_coding | |
| CANONICAL | ||
| CCDS | ||
| ENSP | ENSP00000317980 | |
| SWISSPROT | Q9UPX0.150 | |
| TREMBL | ||
| UNIPARC | UPI0001545E3E | |
| UNIPROT_ISOFORM | Q9UPX0-1 | |
| RefSeq | ||
| MANE | ||
| APPRIS | ||
| FLAGS | ||
| SIFT | deleterious(0.01) | |
| PolyPhen | probably_damaging(0.988) | |
| EXON | 18/19 | |
| INTRON | ||
| DOMAINS | PANTHER:PTHR12231;PANTHER:PTHR12231:SF240;Low_complexity_(Seg):seg | |
| ThousG_AF | ||
| ThousG_AFR_AF | ||
| ThousG_AMR_AF | ||
| ThousG_EAS_AF | ||
| ThousG_EUR_AF | ||
| ThousG_SAS_AF | ||
| ESP_AA_AF | ||
| ESP_EA_AF | ||
| gnomAD_AF | 1.216e-05 | |
| gnomAD_AFR_AF | ||
| gnomAD_AMR_AF | ||
| gnomAD_ASJ_AF | ||
| gnomAD_EAS_AF | ||
| gnomAD_FIN_AF | ||
| gnomAD_NFE_AF | ||
| gnomAD_OTH_AF | ||
| gnomAD_SAS_AF | ||
| MAX_AF | 9.968e-05 | |
| MAX_AF_POPS | 3.278e-05 | |
| gnomAD_non_cancer_AF | ||
| gnomAD_non_cancer_AFR_AF | ||
| gnomAD_non_cancer_AMI_AF | ||
| gnomAD_non_cancer_AMR_AF | ||
| gnomAD_non_cancer_ASJ_AF | ||
| gnomAD_non_cancer_EAS_AF | ||
| gnomAD_non_cancer_FIN_AF | ||
| gnomAD_non_cancer_MID_AF | ||
| gnomAD_non_cancer_NFE_AF | ||
| gnomAD_non_cancer_OTH_AF | ||
| gnomAD_non_cancer_SAS_AF | ||
| gnomAD_non_cancer_MAX_AF_adj | ||
| gnomAD_non_cancer_MAX_AF_POPS_adj | ||
| CLIN_SIG | ||
| SOMATIC | 0;1 | |
| PUBMED | ||
| TRANSCRIPTION_FACTORS | ||
| MOTIF_NAME | ||
| MOTIF_POS | ||
| HIGH_INF_POS | ||
| MOTIF_SCORE_CHANGE | ||
| miRNA | ||
| IMPACT | MODERATE | |
| PICK | ||
| VARIANT_CLASS | SNV | |
| TSL | 5 | |
| HGVS_OFFSET | ||
| PHENO | 0;1 | |
| GENE_PHENO | ||
| CONTEXT | GGCCACGCTGT | |
| tumor_submitter_uuid | 8c3559db-155f-42d3-9a73-38d5610f74b5 | |
| normal_submitter_uuid | 59778b5f-335a-471e-abb2-6dde0b5d7fe7 | |
| case_id | 941f75a1-fea4-4539-ba69-60bb11608f6d | |
| GDC_FILTER | ||
| COSMIC | COSM376595;COSM376596 | |
| hotspot | False | |
| RNA_Support | Unknown | |
| RNA_depth | ||
| RNA_ref_count | ||
| RNA_alt_count | ||
| callers | muse;mutect2;varscan2 | |
| file_gdc_id | 3fd5afe7-9e69-4ea8-ab01-80e41783d795 | |
| muse | Yes | |
| mutect2 | Yes | |
| pindel | No | |
| varscan2 | Yes | |
| sample_barcode_tumor | TCGA-C5-A1MI-01A | |
| sample_barcode_normal | TCGA-C5-A1MI-10A | |
| aliquot_barcode_tumor | TCGA-C5-A1MI-01A-11D-A14W-08 | |
| aliquot_barcode_normal | TCGA-C5-A1MI-10A-01D-A14W-08 |
We will use only a few fields for extracting data for the phenopacket. These fields are explained below.