alliance-genotype Report
{ % include 'docs/nodes_report.md' %}
Alliance Genotype Ingest Pipeline
The Alliance of Genome Resources contains a subset of model organism data from member databases that is harmonized to the same model. Over time, as the alliance adds additional data types, individual MOD ingests can be replaced by collective Alliance ingest. The Alliance has bulk data downloads, ingest data formats, and an API. The preference should be bulk downloads first, followed by ingest formats, finally by API calls. In some cases it may continue to be more practical to load from individual MODs when data is not yet fully harmonized in the Alliance.
This pipeline converts Alliance AGM (Affected Genomic Model) files from MGI, ZFIN & RGD to KGX TSV following the Biolink Model. Additionally, it processes Alliance allele data and creates relationships between genotypes, alleles, and genes.
Genotypes
Genotypes from Alliance member databases (currently MGI, ZFIN, and RGD) are loaded from AGM JSON files.
Biolink captured
- biolink:Genotype
- id (row['primaryID'])
- name (row['name'])
- type (row['subtype'] if available)
- in_taxon (row['taxonId'])
- in_taxon_label (mapped from taxon ID)
Example transform:
{
'primaryID': 'ZFIN:ZDB-FISH-150901-9455',
'name': 'acvr1l<sup>sk42/sk42</sup>; f2Tg',
'affectedGenomicModelComponents': [
{'alleleID': 'ZFIN:ZDB-ALT-060821-6', 'zygosity': 'GENO:0000137'},
{'alleleID': 'ZFIN:ZDB-ALT-100701-1', 'zygosity': 'GENO:0000136'}],
'crossReference': { 'id': 'ZFIN:ZDB-FISH-150901-9455', 'pages': ['Fish']},
'taxonId': 'NCBITaxon:7955'
}
Is transformed into
category: biolink:Genotype
id: ZFIN:ZDB-FISH-150901-9455
name: acvr1l<sup>sk42/sk42</sup>; f2Tg
in_taxon: NCBITaxon:7955
in_taxon_label: Danio rerio
Alleles
Alleles (sequence variants) are loaded from Alliance VARIANT-ALLELE TSV files.
Biolink captured
- biolink:SequenceVariant
- id (row['AlleleId'])
- name (row['AlleleSymbol'])
- synonym (parsed from row['AlleleSynonyms'])
- in_taxon (row['Taxon'])
- in_taxon_label (row['SpeciesName'])
Example transform:
# Source data
{
"AlleleId": "ZFIN:ZDB-ALT-123456-7",
"AlleleSymbol": "tyr<b1>",
"AlleleSynonyms": "tyrosinase b1,tyr-b1",
"Taxon": "NCBITaxon:7955",
"SpeciesName": "Danio rerio",
"AlleleAssociatedGeneId": "ZFIN:ZDB-GENE-000508-1",
"AlleleAssociatedGeneSymbol": "tyr",
"VariantsTypeId": "SO:0001059"
}
Is transformed into:
category: biolink:SequenceVariant
id: ZFIN:ZDB-ALT-123456-7
name: tyr<b1>
in_taxon: NCBITaxon:7955
in_taxon_label: Danio rerio
synonym: ["tyrosinase b1", "tyr-b1"]
Genotype to Variant Association
Associations between genotypes and their component alleles (variants) are created from the AGM JSON files.
Biolink captured
- biolink:GenotypeToVariantAssociation
- id (random uuid)
- subject (genotype.id from row['primaryID'])
- predicate (biolink:has_sequence_variant)
- object (allele['alleleID'] from row['affectedGenomicModelComponents'])
- qualifier (allele['zygosity'] if available)
- primary_knowledge_source (mapped from source prefix in the primaryID)
- aggregator_knowledge_source (["infores:monarchinitiative", "infores:agrkb"])
- knowledge_level (knowledge_assertion)
- agent_type (manual_agent)
Example transform:
# Source data (partial)
{
'primaryID': 'ZFIN:ZDB-FISH-150901-9455',
'affectedGenomicModelComponents': [
{'alleleID': 'ZFIN:ZDB-ALT-060821-6', 'zygosity': 'GENO:0000137'},
{'alleleID': 'ZFIN:ZDB-ALT-100701-1', 'zygosity': 'GENO:0000136'}
],
'taxonId': 'NCBITaxon:7955'
}
Is transformed into:
category: biolink:GenotypeToVariantAssociation
id: fa8b4567-e9c6-4ebd-a6f1-fcd82f3f0a83 # random UUID
subject: ZFIN:ZDB-FISH-150901-9455
predicate: biolink:has_sequence_variant
object: ZFIN:ZDB-ALT-060821-6
qualifier: GENO:0000137
primary_knowledge_source: infores:zfin
aggregator_knowledge_source: ["infores:monarchinitiative", "infores:agrkb"]
knowledge_level: knowledge_assertion
agent_type: manual_agent
And a second association:
category: biolink:GenotypeToVariantAssociation
id: 8a7d6c2f-3b58-4e91-9f12-a7b8e6d234e0 # random UUID
subject: ZFIN:ZDB-FISH-150901-9455
predicate: biolink:has_sequence_variant
object: ZFIN:ZDB-ALT-100701-1
qualifier: GENO:0000136
primary_knowledge_source: infores:zfin
aggregator_knowledge_source: ["infores:monarchinitiative", "infores:agrkb"]
knowledge_level: knowledge_assertion
agent_type: manual_agent
Genotype to Gene Association
Associations between genotypes and genes are created by linking through alleles. The allele to gene mapping is precomputed from the VARIANT-ALLELE files.
Biolink captured
- biolink:GenotypeToGeneAssociation
- id (random uuid)
- subject (genotype.id from row['primaryID'])
- predicate (biolink:has_part)
- object (gene ID from allele_to_gene_lookup)
- primary_knowledge_source (mapped from source prefix in the primaryID)
- aggregator_knowledge_source (["infores:monarchinitiative", "infores:agrkb"])
- knowledge_level (knowledge_assertion)
- agent_type (manual_agent)
Example transform:
# Source data (partial)
{
'primaryID': 'ZFIN:ZDB-FISH-150901-9455',
'affectedGenomicModelComponents': [
{'alleleID': 'ZFIN:ZDB-ALT-060821-6', 'zygosity': 'GENO:0000137'}
],
'taxonId': 'NCBITaxon:7955'
}
# With allele_to_gene_lookup containing:
# 'ZFIN:ZDB-ALT-060821-6': {'AlleleAssociatedGeneId': 'ZFIN:ZDB-GENE-030616-554'}
Is transformed into:
category: biolink:GenotypeToGeneAssociation
id: 5d9c3a17-85f2-4bd2-9e3f-c8a7b45e1234 # random UUID
subject: ZFIN:ZDB-FISH-150901-9455
predicate: biolink:has_part
object: ZFIN:ZDB-GENE-030616-554
primary_knowledge_source: infores:zfin
aggregator_knowledge_source: ["infores:monarchinitiative", "infores:agrkb"]
knowledge_level: knowledge_assertion
agent_type: manual_agent
Variant to Gene Association
Associations between alleles (variants) and genes are created from the VARIANT-ALLELE files.
Biolink captured
- biolink:VariantToGeneAssociation
- id (random uuid)
- subject (row['AlleleId'])
- predicate (biolink:is_sequence_variant_of)
- original_predicate (row['VariantsTypeId'])
- object (row['AlleleAssociatedGeneId'])
- primary_knowledge_source (mapped from the source prefix in the AlleleId)
- aggregator_knowledge_source (["infores:monarchinitiative", "infores:agrkb"])
- knowledge_level (knowledge_assertion)
- agent_type (manual_agent)
Example transform:
# Source data
{
"AlleleId": "ZFIN:ZDB-ALT-123456-7",
"AlleleSymbol": "tyr<b1>",
"Taxon": "NCBITaxon:7955",
"SpeciesName": "Danio rerio",
"AlleleAssociatedGeneId": "ZFIN:ZDB-GENE-000508-1",
"AlleleAssociatedGeneSymbol": "tyr",
"VariantsTypeId": "SO:0001059" # sequence_alteration
}
Is transformed into:
category: biolink:VariantToGeneAssociation
id: 2e4c6a8d-1b3f-4e7a-9d5c-8a6b4c2e3d5f # random UUID
subject: ZFIN:ZDB-ALT-123456-7
predicate: biolink:is_sequence_variant_of
original_predicate: SO:0001059
object: ZFIN:ZDB-GENE-000508-1
primary_knowledge_source: infores:zfin
aggregator_knowledge_source: ["infores:monarchinitiative", "infores:agrkb"]
knowledge_level: knowledge_assertion
agent_type: manual_agent
Citation
Harmonizing model organism data in the Alliance of Genome Resources. 2022. Alliance of Genome Resources Consortium. Genetics, Volume 220, Issue 4, April 2022. Published Online: 25 February 2022. doi: doi.org/10.1093/genetics/iyac022. PMID: 35380658; PMCID: PMC8982023.