Skip to content

CDA Mutation

We extract information about variants from the CDA table mutation. We first summarize the table and then outline our ETL strategy.

Column Example Explanation
project_short_name TCGA-CESC
case_barcode TCGA-C5-A1MI
cda_subject_id TCGA.TCGA-C5-A1MI
primary_site Cervix uteri
Hugo_Symbol IGSF9B
Entrez_Gene_Id 22997
Center BI
NCBI_Build GRCh38
Chromosome chr11
Start_Position 133921225
End_Position 133921225
Strand +
Variant_Classification Missense_Mutation
Variant_Type SNP
Reference_Allele C
Tumor_Seq_Allele1 C
Tumor_Seq_Allele2 T
dbSNP_RS rs771150072
dbSNP_Val_Status
Tumor_Aliquot_Barcode TCGA-C5-A1MI-01A-11D-A14W-08
Matched_Norm_Aliquot_Barcode TCGA-C5-A1MI-10A-01D-A14W-08
Match_Norm_Seq_Allele1
Match_Norm_Seq_Allele2
Tumor_Validation_Allele1
Tumor_Validation_Allele2
Match_Norm_Validation_Allele1
Match_Norm_Validation_Allele2
Verification_Status
Validation_Status
Mutation_Status Somatic
Sequencing_Phase
Sequence_Source
Validation_Method
Score
BAM_File
Sequencer
Tumor_Aliquot_UUID 497c20f0-8a42-4d20-abdc-0415982ebb9f
Matched_Norm_Aliquot_UUID a3d0503b-baac-4f83-9182-be7b4154c61d
HGVSc c.2500G>A
HGVSp p.Val834Met
HGVSp_Short p.V834M
Transcript_ID ENST00000321016
Exon_Number 18/19
t_depth 64
t_ref_count 43
t_alt_count 20
n_depth 88
n_ref_count
n_alt_count
all_effects IGSF9B,missense_variant,p.V834M,ENST00000533871,NM_001277285.4,c.2500G>A,MODERATE,YES,deleterious(0),probably_damaging(1),-1;IGSF9B,missense_variant,p.V834M,ENST00000321016,,c.2500G>A,MODERATE,,deleterious(0.01),probably_damaging(0.988),-1;IGSF9B,downstream_gene_variant,,ENST00000527648,,,MODIFIER,,,,-1
Allele T
Gene ENSG00000080854
Feature ENST00000321016
Feature_type Transcript
One_Consequence missense_variant
Consequence missense_variant
cDNA_position 2500/4050
Protein_position 834/1349
Amino_acids V/M
Codons Gtg/Atg
Existing_variation rs771150072;COSV58068494
DISTANCE
TRANSCRIPT_STRAND -1
SYMBOL IGSF9B
SYMBOL_SOURCE HGNC
HGNC_ID HGNC:32326
BIOTYPE protein_coding
CANONICAL
CCDS
ENSP ENSP00000317980
SWISSPROT Q9UPX0.150
TREMBL
UNIPARC UPI0001545E3E
UNIPROT_ISOFORM Q9UPX0-1
RefSeq
MANE
APPRIS
FLAGS
SIFT deleterious(0.01)
PolyPhen probably_damaging(0.988)
EXON 18/19
INTRON
DOMAINS PANTHER:PTHR12231;PANTHER:PTHR12231:SF240;Low_complexity_(Seg):seg
ThousG_AF
ThousG_AFR_AF
ThousG_AMR_AF
ThousG_EAS_AF
ThousG_EUR_AF
ThousG_SAS_AF
ESP_AA_AF
ESP_EA_AF
gnomAD_AF 1.216e-05
gnomAD_AFR_AF
gnomAD_AMR_AF
gnomAD_ASJ_AF
gnomAD_EAS_AF
gnomAD_FIN_AF
gnomAD_NFE_AF
gnomAD_OTH_AF
gnomAD_SAS_AF
MAX_AF 9.968e-05
MAX_AF_POPS 3.278e-05
gnomAD_non_cancer_AF
gnomAD_non_cancer_AFR_AF
gnomAD_non_cancer_AMI_AF
gnomAD_non_cancer_AMR_AF
gnomAD_non_cancer_ASJ_AF
gnomAD_non_cancer_EAS_AF
gnomAD_non_cancer_FIN_AF
gnomAD_non_cancer_MID_AF
gnomAD_non_cancer_NFE_AF
gnomAD_non_cancer_OTH_AF
gnomAD_non_cancer_SAS_AF
gnomAD_non_cancer_MAX_AF_adj
gnomAD_non_cancer_MAX_AF_POPS_adj
CLIN_SIG
SOMATIC 0;1
PUBMED
TRANSCRIPTION_FACTORS
MOTIF_NAME
MOTIF_POS
HIGH_INF_POS
MOTIF_SCORE_CHANGE
miRNA
IMPACT MODERATE
PICK
VARIANT_CLASS SNV
TSL 5
HGVS_OFFSET
PHENO 0;1
GENE_PHENO
CONTEXT GGCCACGCTGT
tumor_submitter_uuid 8c3559db-155f-42d3-9a73-38d5610f74b5
normal_submitter_uuid 59778b5f-335a-471e-abb2-6dde0b5d7fe7
case_id 941f75a1-fea4-4539-ba69-60bb11608f6d
GDC_FILTER
COSMIC COSM376595;COSM376596
hotspot False
RNA_Support Unknown
RNA_depth
RNA_ref_count
RNA_alt_count
callers muse;mutect2;varscan2
file_gdc_id 3fd5afe7-9e69-4ea8-ab01-80e41783d795
muse Yes
mutect2 Yes
pindel No
varscan2 Yes
sample_barcode_tumor TCGA-C5-A1MI-01A
sample_barcode_normal TCGA-C5-A1MI-10A
aliquot_barcode_tumor TCGA-C5-A1MI-01A-11D-A14W-08
aliquot_barcode_normal TCGA-C5-A1MI-10A-01D-A14W-08

We will use only a few fields for extracting data for the phenopacket. These fields are explained below.