alliance-phenotype-association-ingest Report

Edges Report

category	subject_prefix	predicate	object_prefix	count_star()
biolink:GeneToPhenotypicFeatureAssociation	MGI	biolink:has_phenotype	MP	287772
biolink:GeneToPhenotypicFeatureAssociation	RGD	biolink:has_phenotype	MP	3456
biolink:GeneToPhenotypicFeatureAssociation	WB	biolink:has_phenotype	WBPhenotype	26906
biolink:GenotypeToPhenotypicFeatureAssociation	MGI	biolink:has_phenotype	MP	391456
biolink:GenotypeToPhenotypicFeatureAssociation	RGD	biolink:has_phenotype	MP	1937
biolink:VariantToPhenotypicFeatureAssociation	MGI	biolink:has_phenotype	MP	314400
biolink:VariantToPhenotypicFeatureAssociation	RGD	biolink:has_phenotype	MP	1753
biolink:VariantToPhenotypicFeatureAssociation	WB	biolink:has_phenotype	WBPhenotype	27254

Alliance Phenotype Association Ingest Pipeline

This pipeline transforms the Alliance of Genome Resources phenotype association files to kgx tsv following the Biolink Model.

The association files used are part of the process for importing MOD data into the Alliance KG, and in this case are an initial limited format. This format doesn't supply any category/type information for the subject of the phenotype associations - which makes it a challenge whether to produce a gene to phenotype association vs a genotype to phenotype association.

This ingest solves that problem by download gene, allele and genotype files and using a post-procesing step to create lookup lists of all genes, alleles and genotypes so that a category can be assigned in the ingest process. This runs in the Makefile under the target post-download. Within the koza transformation, the lists of genes, alleles and genotypes are loaded as lookup maps.

Filtering

This pipeline only captures phenotype associations which are expressed as a phenotype term, not handling post-composed phenotype annotations. ZFIN phenotype associations are ingested elsewhere, directly from ZFIN and mapped to ZP. Support for FB & SGD phenotype associations is in progress, via the https://github.com/monarch-initiative/uphenotizer project which aims to add uPheno terms necessary for post-composed FB & SGD annotations.

Example transform

Given an entry in PHENOTYPE_MGI.json:

{
  "objectId": "MGI:87853",
  "phenotypeTermIdentifiers": [
    {
      "termId": "MP:0002075",
      "termOrder": 1
    }
  ],
  "phenotypeStatement": "abnormal coat/hair pigmentation",
  "evidence": {
    "publicationId": "PMID:1473152",
    "crossReference": {
      "id": "MGI:52036",
      "pages": [
        "reference"
      ]
    }
  },
  "primaryGeneticEntityIDs": [
    "MGI:3714610"
  ],
  "dateAssigned": "2010-03-15T00:00:00-04:00"
}

The resulting biolink class will be:

    category: biolink:GeneToPhenotypicFeatureAssociation
    subject: MGI:87853
    predicate: biolink:has_phenotype
    object: MP:0002075
    publications: PMID:1473152
    primary_knowledge_source: infores:mgi
    aggregator_knowledge_source: infores:monarchinitiative|infores:agrkb
    knowledge_level: knowledge_assertion
    agent_type: manual_agent