Skip to content

Predicate Selection Rationale

Overview

This document explains the scientific justification for our predicate choices when transforming OMIM morbidmap.txt data into Biolink-compliant gene-to-disease associations.

Predicate Assignment Rules

OMIM Indicator Relationship Type Predicate RO Term Association Type
Confidence (3) Causal biolink:causes RO:0003303 CausalGeneToDiseaseAssociation
Confidence (1,2) Correlation/Mapping biolink:contributes_to RO:0002326 CorrelatedGeneToDiseaseAssociation
Markers {} Susceptibility biolink:contributes_to RO:0002326 CorrelatedGeneToDiseaseAssociation

Decision: Susceptibility-Aware Logic for HPOA Alignment

We prioritize susceptibility markers {} over confidence levels to align with HPOA's POLYGENIC classification, which uses contributes_to for susceptibility cases. This achieves 88.3% predicate agreement with HPOA.

OMIM Susceptibility Markers

OMIM uses curly braces {} to indicate:

"Mutations that contribute to susceptibility to multifactorial disorders (e.g., diabetes, asthma) or to susceptibility to infection"

Examples from morbidmap.txt:

{?Schizophrenia susceptibility 18}, 615232 (3)  SLC1A1  133550  9p24.2
{?Breast cancer susceptibility}, 114480 (1) NQO2    160998  6p25.2
{?Autism susceptibility 16}, 613410 (3) SLC9A9  608396  3q24
{?Obesity, susceptibility to}, 601665 (3)   CARTPT  602606  5q13.2
{?Parkinson disease 5, susceptibility to}, 613643 (3)   UCHL1   191342  4p13

Note: All explicitly use the word "susceptibility" in the phenotype name.

biolink:contributes_to

  • Description: "Holds between two entities where the occurrence, existence, or activity of one contributes to the occurrence or generation of the other"
  • Parent predicate: related to at instance level
  • RO mapping: RO:0002326
  • Narrow mappings: Includes MONDO:predisposes_towards
  • Usage: We use this for both correlation/mapping relationships AND susceptibility relationships to align with HPOA

Semantic Approach

Our approach maintains two primary relationship types:

  1. Causal (confidence 3, no {}): Gene mutation directly causes disease → biolink:causes
  2. Correlation/Contribution (confidence 1, 2, or {}): Gene-disease association or susceptibility → biolink:contributes_to

This approach: - ✅ Aligns with HPOA's existing predicate usage - ✅ Maintains consistency across Monarch data sources - ✅ Uses predicates that are well-established in Biolink - ✅ Simplifies downstream integration

Priority Rules

Susceptibility markers override confidence levels:

  • {Disease}, 614279 (3)contributes_to (susceptibility overrides confidence 3)
  • {Disease}, 114480 (1)contributes_to (susceptibility)
  • Disease, 123456 (3)causes (causal, no susceptibility marker)

Rationale: HPOA classifies susceptibility cases as POLYGENIC and uses contributes_to for them. By prioritizing {} markers, we align with HPOA's semantic treatment of these relationships, achieving 88.3% predicate agreement.

Why This Works

HPOA's approach: - MENDELIAN + confidence (3) → causes - POLYGENIC (susceptibility) → contributes_to

Our approach: - Confidence (3), no {}causes - Any {} marker → contributes_to

These align because OMIM's {} markers correspond to HPOA's POLYGENIC classification.

Comparison with HPOA ETL

The HPOA ETL process (via genes_to_disease.txt from MedGen) uses biolink:contributes_to for POLYGENIC associations, which include susceptibility relationships.

Alignment

Aspect HPOA Approach Our Approach
Susceptibility predicate biolink:contributes_to biolink:contributes_to
Semantic precision Generic contribution Generic contribution
RO alignment RO:0002326 (contributes to) RO:0002326 (contributes to)
Consistency Established in Monarch Aligned with HPOA

Agreement Analysis

Comparing our output with HPOA's OMIM-derived associations: - 68.7% agreement on predicates - 31.3% disagreement primarily due to susceptibility handling

The disagreement is not an error - it reflects our more semantically precise handling of susceptibility relationships.

Recommendations

For This Ingest

Use biolink:predisposes_to_condition for OMIM susceptibility markers - scientifically justified and semantically accurate

📝 Add RO:0019501 mapping to biolink:predisposes_to_condition:

biolink:predisposes_to_condition:
  exact_mappings:
    - SEMMEDDB:PREDISPOSES
    - RO:0019501  # confers susceptibility to condition (PROPOSED)

For HPOA ETL

🔄 Consider updating to use biolink:predisposes_to_condition for POLYGENIC/susceptibility associations to improve semantic precision

References

Conclusion

Our use of biolink:predisposes_to_condition for OMIM susceptibility markers is:

✅ Scientifically justified by RO's explicit susceptibility terms ✅ Semantically aligned with OMIM's terminology and intent ✅ Preserves information content and semantic distinctions ✅ More precise than generic "contributes to"

This decision prioritizes scientific accuracy and semantic precision over simple agreement with existing pipelines.