Skip to content

alliance-phenotype-association-ingest Report

Edges Report

category subject_prefix predicate object_prefix count_star()
biolink:GeneToPhenotypicFeatureAssociation MGI biolink:has_phenotype MP 287772
biolink:GeneToPhenotypicFeatureAssociation RGD biolink:has_phenotype MP 3456
biolink:GeneToPhenotypicFeatureAssociation WB biolink:has_phenotype WBPhenotype 26906
biolink:GenotypeToPhenotypicFeatureAssociation MGI biolink:has_phenotype MP 391456
biolink:GenotypeToPhenotypicFeatureAssociation RGD biolink:has_phenotype MP 1937
biolink:VariantToPhenotypicFeatureAssociation MGI biolink:has_phenotype MP 314400
biolink:VariantToPhenotypicFeatureAssociation RGD biolink:has_phenotype MP 1753
biolink:VariantToPhenotypicFeatureAssociation WB biolink:has_phenotype WBPhenotype 27254

Alliance Phenotype Association Ingest Pipeline

This pipeline transforms the Alliance of Genome Resources phenotype association files to kgx tsv following the Biolink Model.

The association files used are part of the process for importing MOD data into the Alliance KG, and in this case are an initial limited format. This format doesn't supply any category/type information for the subject of the phenotype associations - which makes it a challenge whether to produce a gene to phenotype association vs a genotype to phenotype association.

This ingest solves that problem by download gene, allele and genotype files and using a post-procesing step to create lookup lists of all genes, alleles and genotypes so that a category can be assigned in the ingest process. This runs in the Makefile under the target post-download. Within the koza transformation, the lists of genes, alleles and genotypes are loaded as lookup maps.

Filtering

This pipeline only captures phenotype associations which are expressed as a phenotype term, not handling post-composed phenotype annotations. ZFIN phenotype associations are ingested elsewhere, directly from ZFIN and mapped to ZP. Support for FB & SGD phenotype associations is in progress, via the https://github.com/monarch-initiative/uphenotizer project which aims to add uPheno terms necessary for post-composed FB & SGD annotations.

Example transform

Given an entry in PHENOTYPE_MGI.json:

{
  "objectId": "MGI:87853",
  "phenotypeTermIdentifiers": [
    {
      "termId": "MP:0002075",
      "termOrder": 1
    }
  ],
  "phenotypeStatement": "abnormal coat/hair pigmentation",
  "evidence": {
    "publicationId": "PMID:1473152",
    "crossReference": {
      "id": "MGI:52036",
      "pages": [
        "reference"
      ]
    }
  },
  "primaryGeneticEntityIDs": [
    "MGI:3714610"
  ],
  "dateAssigned": "2010-03-15T00:00:00-04:00"
}

The resulting biolink class will be:

    category: biolink:GeneToPhenotypicFeatureAssociation
    subject: MGI:87853
    predicate: biolink:has_phenotype
    object: MP:0002075
    publications: PMID:1473152
    primary_knowledge_source: infores:mgi
    aggregator_knowledge_source: infores:monarchinitiative|infores:agrkb
    knowledge_level: knowledge_assertion
    agent_type: manual_agent