Probability Tables

Overview

Probability Tables (PTables) provide a compact way to specify probabilistic relationships between entities in a Knowledge Base (KB). They are particularly useful for working with ontology alignment problems where multiple sources need to be integrated.

File Format

A Probability Table is a Tab-Separated Values (TSV) file with 6 columns:

Subject ID: The identifier for the subject entity
Object ID: The identifier for the object entity
P(SubClassOf): Probability that Subject is a subclass of Object (range: 0.0-1.0)
P(SuperClassOf): Probability that Object is a subclass of Subject (range: 0.0-1.0)
P(EquivalentTo): Probability that Subject and Object are equivalent (range: 0.0-1.0)
P(DisjointWith): Probability that Subject and Object are not in a subsumption relationship (range: 0.0-1.0)

Example row:

ORDO:464724 MONDO:0000023   0.033333333333333326    0.033333333333333326    0.9 0.033333333333333326

This specifies: - Subject: ORDO:464724 - Object: MONDO:0000023 - P(ORDO:464724 SubClassOf MONDO:0000023) = 0.033... - P(MONDO:0000023 SubClassOf ORDO:464724) = 0.033... - P(ORDO:464724 EquivalentTo MONDO:0000023) = 0.9 - P(ORDO:464724 DisjointWith MONDO:0000023) = 0.033...

Interpretation

Each row in a PTable generates four probabilistic facts, one for each possible relationship between the subject and object:

ProperSubClassOf(Subject, Object)
ProperSubClassOf(Object, Subject)
EquivalentTo(Subject, Object)
NotInSubsumptionWith(Subject, Object)

Additionally, each unique entity ID is assigned to a disjoint group based on its prefix (e.g., "MONDO", "ORDO", "ICD10CM"). This helps model the constraint that terms from different ontologies should not be confused.

Usage in BOOMER

To use a Probability Table with BOOMER:

from boomer.io import ptable_to_kb
from boomer.search import solve

# Create a KB directly from a PTable file
kb = ptable_to_kb(
    "path/to/file.ptable.tsv",
    name="My KB",  # Optional, defaults to filename
    description="Description of the KB"
)

# Solve the KB
solution = solve(kb)

# Analyze the results
for spf in solution.solved_pfacts:
    if spf.truth_value and spf.posterior_prob > 0.8:
        print(f"High confidence: {spf.pfact.fact} (posterior: {spf.posterior_prob})")

Common Use Cases

PTables are particularly useful for:

Ontology Alignment: Mapping terms between different ontologies (e.g., MONDO to ICD10)
Disease Classification: Relating disease terms across different medical coding systems
Entity Resolution: Determining when entities from different sources refer to the same concept

Generating PTables

PTables can be created:

Manually, for small test cases
Using machine learning to generate relationship probabilities
From existing ontology mappings with confidence values
By converting other probabilistic relationship formats

Limitations

Each row only models the direct relationship between two entities
More complex relationships involving more than two entities require multiple rows
The sum of probabilities in a row does not need to equal 1.0, as the relationships are not mutually exclusive