zfin-orthology-ingest Report
This transform pulls in Human, Fly and Mouse orthology files from ZFIN and produces a Biolink KGX representation of the orthology relationships, with the zebrafish gene always occupying the subject position. Due to genome duplication, Zebrafish orthology can be especially complex to establish, and this makes the manual curation of orthology relationships provdided by ZFIN especially valuable.
There are two awkard aspects to this transform with respect to Koza's limitations. The human_orthos.txt file has one more column than fly_orthos.txt and mouse_orthos.txt, additionally, these files have a single valued pulication field, but the Biolink model has a multivalued field. These two isses are addressed by a preprocessing SQL step that uses duckdb to normalize the columns and aggregate up to the publication. This step also adds gene prefixes for both the ZFIN gene and the ortholog, because the alternative (at least on the ortholog side) would have been to have bare integer IDs.
The transform results in the associations listed below
Edges Report
| category | subject_prefix | predicate | object_prefix | count_star() |
|---|---|---|---|---|
| biolink:GeneToGeneHomologyAssociation | ZFIN | biolink:orthologous_to | FB | 70 |
| biolink:GeneToGeneHomologyAssociation | ZFIN | biolink:orthologous_to | HGNC | 34195 |
| biolink:GeneToGeneHomologyAssociation | ZFIN | biolink:orthologous_to | MGI | 28207 |