Skip to content

End-to-End Ontology Alignment: OBO + OWL + SSSOM

This tutorial aligns two disease ontologies by merging three complementary data sources:

File Format What it provides
mondo_subset.obo OBO MONDO disease hierarchy + xref mappings to ORDO
ordo_subset.ofn OWL Functional Syntax ORDO rare disease hierarchy + disjointness constraints
mondo_ordo_mappings.sssom.tsv SSSOM Cross-ontology equivalence candidates with confidence scores
from pathlib import Path
from docs.tutorial.notebook_utils import show

DIR = Path("docs/tutorial/ontology-alignment-files")

Input Files

MONDO hierarchy (OBO)

Three diseases under a common root, with xref lines pointing to ORDO for the first two. When BOOMER converts an OBO file, xref entries become probabilistic EquivalentTo facts (default probability 0.7), while is_a relations become hard ProperSubClassOf facts. See the Ontology Conversion docs for the full mapping table and configuration options.

show(DIR / "mondo_subset.obo")

mondo_subset.obo

format-version: 1.4
ontology: mondo-subset

[Term]
id: MONDO:0000001
name: disease

[Term]
id: MONDO:0001234
name: alpha disease
is_a: MONDO:0000001 ! disease
xref: ORDO:100

[Term]
id: MONDO:0005678
name: beta disease
is_a: MONDO:0000001 ! disease
xref: ORDO:200

[Term]
id: MONDO:0009999
name: gamma disease
is_a: MONDO:0000001 ! disease

ORDO hierarchy (OWL)

Three rare diseases under a grouping class. Crucially, they are declared pairwise disjoint:

show(DIR / "ordo_subset.ofn", lang="turtle")

ordo_subset.ofn

Prefix(ORDO:=<http://www.orpha.net/ORDO/Orphanet_>)
Prefix(owl:=<http://www.w3.org/2002/07/owl#>)
Prefix(rdfs:=<http://www.w3.org/2000/01/rdf-schema#>)

Ontology(<http://www.orpha.net/ORDO/ordo-subset>

Declaration(Class(ORDO:100))
Declaration(Class(ORDO:200))
Declaration(Class(ORDO:300))
Declaration(Class(ORDO:999))

SubClassOf(ORDO:100 ORDO:999)
SubClassOf(ORDO:200 ORDO:999)
SubClassOf(ORDO:300 ORDO:999)

DisjointClasses(ORDO:100 ORDO:200 ORDO:300)

AnnotationAssertion(rdfs:label ORDO:100 "Alpha rare disease")
AnnotationAssertion(rdfs:label ORDO:200 "Beta rare disease")
AnnotationAssertion(rdfs:label ORDO:300 "Delta rare disease")
AnnotationAssertion(rdfs:label ORDO:999 "Rare disease grouping")
)

Cross-ontology mappings (SSSOM)

Four candidate equivalences. Note the conflicting last row — MONDO:0009999 has a weak match (0.3) to ORDO:100, which is already strongly matched to MONDO:0001234:

show(DIR / "mondo_ordo_mappings.sssom.tsv", lang="tsv")

mondo_ordo_mappings.sssom.tsv

tsv
#curie_map:
#  MONDO: http://purl.obolibrary.org/obo/MONDO_
#  ORDO: http://www.orpha.net/ORDO/Orphanet_
#  skos: http://www.w3.org/2004/02/skos/core#
#  semapv: https://w3id.org/semapv/vocab/
#mapping_set_id: https://example.org/mondo-ordo-mappings
#mapping_set_description: MONDO to ORDO candidate mappings
subject_id  subject_label   predicate_id    object_id   object_label    mapping_justification   confidence
MONDO:0001234   alpha disease   skos:exactMatch ORDO:100    Alpha rare disease  semapv:LexicalMatching  0.9
MONDO:0005678   beta disease    skos:exactMatch ORDO:200    Beta rare disease   semapv:LexicalMatching  0.85
MONDO:0009999   gamma disease   skos:exactMatch ORDO:300    Delta rare disease  semapv:LexicalMatching  0.6
MONDO:0009999   gamma disease   skos:exactMatch ORDO:100    Alpha rare disease  semapv:LexicalMatching  0.3

Merge

Pass all three files directly to merge — formats are auto-detected from extensions.

Each converter auto-generates MemberOfDisjointGroup hard facts as a post-processing step: every CURIE encountered during conversion is split on : and assigned to a disjoint group named after its prefix (e.g. all MONDO:* entities go into group MONDO, all ORDO:* into group ORDO). This encodes the assumption that entities within the same namespace are distinct — a standard pattern in ontology alignment. The behavior is controlled by auto_disjoint_groups in OntologyConverterConfig (default True).

%%bash
uv run python -m boomer.cli merge \
  docs/tutorial/ontology-alignment-files/mondo_subset.obo \
  docs/tutorial/ontology-alignment-files/ordo_subset.ofn \
  docs/tutorial/ontology-alignment-files/mondo_ordo_mappings.sssom.tsv \
  -o docs/tutorial/ontology-alignment-files/merged.yaml \
  -n "MONDO-ORDO Alignment"
Merged 3 files into docs/tutorial/ontology-alignment-files/merged.yaml (yaml)
Final KB contains 25 f
acts and 6 probabilistic facts

show(DIR / "merged.yaml")

merged.yaml

name: MONDO-ORDO Alignment
facts:
- fact_type: ProperSubClassOf
  sub: MONDO:0001234
  sup: MONDO:0000001
- fact_type: ProperSubClassOf
  sub: MONDO:0005678
  sup: MONDO:0000001
- fact_type: ProperSubClassOf
  sub: MONDO:0009999
  sup: MONDO:0000001
- fact_type: MemberOfDisjointGroup
  sub: ORDO:200
  group: ORDO
- fact_type: MemberOfDisjointGroup
  sub: MONDO:0000001
  group: MONDO
- fact_type: MemberOfDisjointGroup
  sub: MONDO:0005678
  group: MONDO
- fact_type: MemberOfDisjointGroup
  sub: MONDO:0009999
  group: MONDO
- fact_type: MemberOfDisjointGroup
  sub: ORDO:100
  group: ORDO
- fact_type: MemberOfDisjointGroup
  sub: MONDO:0001234
  group: MONDO
- fact_type: ProperSubClassOf
  sub: ORDO:300
  sup: ORDO:999
- fact_type: ProperSubClassOf
  sub: ORDO:200
  sup: ORDO:999
- fact_type: DisjointWith
  sub: ORDO:100
  sibling: ORDO:200
- fact_type: DisjointWith
  sub: ORDO:100
  sibling: ORDO:300
- fact_type: DisjointWith
  sub: ORDO:200
  sibling: ORDO:300
- fact_type: ProperSubClassOf
  sub: ORDO:100
  sup: ORDO:999
- fact_type: MemberOfDisjointGroup
  sub: ORDO:200
  group: ORDO
- fact_type: MemberOfDisjointGroup
  sub: ORDO:999
  group: ORDO
- fact_type: MemberOfDisjointGroup
  sub: ORDO:100
  group: ORDO
- fact_type: MemberOfDisjointGroup
  sub: ORDO:300
  group: ORDO
- fact_type: MemberOfDisjointGroup
  sub: MONDO:0001234
  group: MONDO
- fact_type: MemberOfDisjointGroup
  sub: ORDO:100
  group: ORDO
- fact_type: MemberOfDisjointGroup
  sub: MONDO:0005678
  group: MONDO
- fact_type: MemberOfDisjointGroup
  sub: ORDO:200
  group: ORDO
- fact_type: MemberOfDisjointGroup
  sub: MONDO:0009999
  group: MONDO
- fact_type: MemberOfDisjointGroup
  sub: ORDO:300
  group: ORDO
pfacts:
- fact:
    fact_type: EquivalentTo
    sub: MONDO:0001234
    equivalent: ORDO:100
  prob: 0.9
- fact:
    fact_type: EquivalentTo
    sub: MONDO:0005678
    equivalent: ORDO:200
  prob: 0.85
- fact:
    fact_type: EquivalentTo
    sub: MONDO:0001234
    equivalent: ORDO:100
  prob: 0.7
- fact:
    fact_type: EquivalentTo
    sub: MONDO:0005678
    equivalent: ORDO:200
  prob: 0.7
- fact:
    fact_type: EquivalentTo
    sub: MONDO:0009999
    equivalent: ORDO:300
  prob: 0.6
- fact:
    fact_type: EquivalentTo
    sub: MONDO:0009999
    equivalent: ORDO:100
  prob: 0.3
hypotheses: []
labels:
  MONDO:0000001: disease
  MONDO:0001234: alpha disease
  MONDO:0005678: beta disease
  MONDO:0009999: gamma disease
  ORDO:999: Rare disease grouping
  ORDO:300: Delta rare disease
  ORDO:100: Alpha rare disease
  ORDO:200: Beta rare disease
hyperparams: []
pfacts_entailed: []

Solve

%%bash
uv run python -m boomer.cli solve \
  docs/tutorial/ontology-alignment-files/merged.yaml \
  --timeout 60 \
  -O yaml \
  -o docs/tutorial/ontology-alignment-files/solution.yaml \
  --quiet
Solving KB: MONDO-ORDO Alignment with 6 pfacts; threshold=200

show(DIR / "solution.yaml")

solution.yaml

name: null
number_of_combinations: 37
number_of_satisfiable_combinations: 28
number_of_combinations_explored_including_implicit: 152
number_of_components: null
confidence: 0.5
prior_prob: 0.15743699999999994
posterior_prob: 0.11114287892650197
proportion_of_combinations_explored: 1.0
ground_pfacts: []
solved_pfacts:
- pfact:
    fact:
      fact_type: EquivalentTo
      sub: MONDO:0001234
      equivalent: ORDO:100
    prob: 0.9
  truth_value: true
  posterior_prob: 0.968219477482972
  metadata: null
- pfact:
    fact:
      fact_type: EquivalentTo
      sub: MONDO:0005678
      equivalent: ORDO:200
    prob: 0.85
  truth_value: true
  posterior_prob: 0.9571896919792617
  metadata: null
- pfact:
    fact:
      fact_type: EquivalentTo
      sub: MONDO:0001234
      equivalent: ORDO:100
    prob: 0.7
  truth_value: true
  posterior_prob: 0.968219477482972
  metadata: null
- pfact:
    fact:
      fact_type: EquivalentTo
      sub: MONDO:0005678
      equivalent: ORDO:200
    prob: 0.7
  truth_value: true
  posterior_prob: 0.9571896919792617
  metadata: null
- pfact:
    fact:
      fact_type: EquivalentTo
      sub: MONDO:0009999
      equivalent: ORDO:300
    prob: 0.6
  truth_value: true
  posterior_prob: 0.5972095150960655
  metadata: null
- pfact:
    fact:
      fact_type: EquivalentTo
      sub: MONDO:0009999
      equivalent: ORDO:100
    prob: 0.3
  truth_value: false
  posterior_prob: 0.004650808173223542
  metadata: null
sub_solutions: []
time_started: 1772508731.125963
time_finished: 1772508731.2317069
timed_out: false
time_elapsed: 0.1057438850402832

Interpreting the Results

Mapping Prior Posterior Verdict Why
MONDO:0001234 ≡ ORDO:100 0.90 ~0.97 Accepted Reinforced by both xref and SSSOM
MONDO:0005678 ≡ ORDO:200 0.85 ~0.96 Accepted Reinforced by both xref and SSSOM
MONDO:0009999 ≡ ORDO:300 0.60 ~0.60 Accepted Moderate match, no competition
MONDO:0009999 ≡ ORDO:100 0.30 ~0.005 Rejected Crushed by disjointness constraint

The key result: the false mapping MONDO:0009999≡ORDO:100 drops from 0.30 to 0.005. This happens because ORDO:100 is already claimed by MONDO:0001234 (high confidence), and the OWL DisjointClasses axiom makes it inconsistent for two MONDO terms to map to the same ORDO class.

Why do some mappings appear twice in the solution?

MONDO:0001234≡ORDO:100 has two pfacts in the merged KB: one from the OBO xref (prob 0.7) and one from the SSSOM skos:exactMatch (prob 0.9). Both are reasoned over independently — the reasoner finds both consistent, and the posterior for each reflects the combined evidence. You can configure xref probabilities per-prefix via OntologyConverterConfig.

TSV Export

%%bash
uv run python -m boomer.cli solve \
  docs/tutorial/ontology-alignment-files/merged.yaml \
  --timeout 60 \
  -O tsv \
  -o docs/tutorial/ontology-alignment-files/solution.tsv \
  --quiet
Solving KB: MONDO-ORDO Alignment with 6 pfacts; threshold=200

show(DIR / "solution.tsv", lang="tsv")

solution.tsv

tsv
# BOOMER Solution TSV Output
#
# Metadata:
#   generated_date: 2026-03-02T19:32:11.670343
#   combinations: 37
#   satisfiable_combinations: 28
#   confidence: 0.5
#   prior_probability: 0.15743699999999997
#   posterior_probability: 0.11114287892650197
#   time_elapsed_seconds: 0.09324312210083008
#   timed_out: False
#
# Format: fact_type followed by arguments, then truth_value and probabilities
#
fact_type   arg1    arg2    arg1_label  arg2_label  truth_value prior_probability   posterior_probability
EquivalentTo    MONDO:0001234   ORDO:100    alpha disease   Alpha rare disease  True    0.9 0.968219477482972
EquivalentTo    MONDO:0005678   ORDO:200    beta disease    Beta rare disease   True    0.85    0.9571896919792622
EquivalentTo    MONDO:0001234   ORDO:100    alpha disease   Alpha rare disease  True    0.7 0.968219477482972
EquivalentTo    MONDO:0005678   ORDO:200    beta disease    Beta rare disease   True    0.7 0.9571896919792622
EquivalentTo    MONDO:0009999   ORDO:300    gamma disease   Delta rare disease  True    0.6 0.5972095150960658
EquivalentTo    MONDO:0009999   ORDO:100    gamma disease   Alpha rare disease  False   0.3 0.004650808173223544

SSSOM Export

SSSOM is the standard format for ontology mappings. Exporting as SSSOM lets you feed boomer results directly into mapping pipelines (sssom-py, OAK, etc.):

%%bash
uv run python -m boomer.cli solve \
  docs/tutorial/ontology-alignment-files/merged.yaml \
  --timeout 60 \
  -O sssom \
  -o docs/tutorial/ontology-alignment-files/solution.sssom.tsv \
  --quiet
show(DIR / "solution.sssom.tsv", lang="tsv")

OBOGraphs Export

OBOGraphs is the standard graph exchange format for ontologies, used by OAK, Monarch, and the broader OBO community:

%%bash
uv run python -m boomer.cli solve \
  docs/tutorial/ontology-alignment-files/merged.yaml \
  --timeout 60 \
  -O obographs \
  -o docs/tutorial/ontology-alignment-files/solution.obographs.json \
  --quiet
show(DIR / "solution.obographs.json", lang="json")

Python API Equivalent

from boomer.ontology_converter import obo_to_kb, owl_to_kb
from boomer.sssom_converter import sssom_to_kb
from boomer.search import solve, SearchConfig

mondo_kb = obo_to_kb(DIR / "mondo_subset.obo")
ordo_kb = owl_to_kb(DIR / "ordo_subset.ofn")
mapping_kb = sssom_to_kb(DIR / "mondo_ordo_mappings.sssom.tsv")

merged = mondo_kb.extend(
    facts=ordo_kb.facts + mapping_kb.facts,
    pfacts=ordo_kb.pfacts + mapping_kb.pfacts,
    labels={**ordo_kb.labels, **mapping_kb.labels},
)
merged.normalize()

solution = solve(merged, config=SearchConfig(timeout_seconds=60))

for sp in solution.solved_pfacts:
    f = sp.pfact.fact
    if f.fact_type == "EquivalentTo":
        verdict = "ACCEPTED" if sp.truth_value else "REJECTED"
        print(f"{verdict}: {f.sub} \u2261 {f.equivalent}  "
              f"(prior={sp.pfact.prob:.2f} \u2192 posterior={sp.posterior_prob:.3f})")
Solving KB: mondo-subset with 6 pfacts; threshold=200

ACCEPTED: MONDO:0001234 ≡ ORDO:100  (prior=0.90 → posterior=0.968)
ACCEPTED: MONDO:0005678 ≡ ORDO:200  (prior=0.85 → posterior=0.957)
ACCEPTED: MONDO:0001234 ≡ ORDO:100  (prior=0.70 → posterior=0.968)
ACCEPTED: MONDO:0005678 ≡ ORDO:200  (prior=0.70 → posterior=0.957)
ACCEPTED: MONDO:0009999 ≡ ORDO:300  (prior=0.60 → posterior=0.597)
REJECTED: MONDO:0009999 ≡ ORDO:100  (prior=0.30 → posterior=0.005)

Summary

# Merge all sources (formats auto-detected)
pyboomer merge ontology.obo hierarchy.ofn mappings.sssom.tsv -o merged.yaml

# Solve
pyboomer solve merged.yaml -O yaml -o solution.yaml

# Export as TSV
pyboomer solve merged.yaml -O tsv -o solution.tsv

# Export as SSSOM (standard mapping format)
pyboomer solve merged.yaml -O sssom -o solution.sssom.tsv

# Export as OBOGraphs JSON (standard ontology graph format)
pyboomer solve merged.yaml -O obographs -o solution.obographs.json

Structural constraints from ontologies (disjointness, hierarchy) interact with probabilistic evidence from mappings to produce better alignments than either source alone.