IO API Reference
This page documents the input/output functions in BOOMER for loading and saving data.
Probability Tables
ptable_to_kb
def ptable_to_kb(ptable: str, name: str = None, description: str = None, comments: str = None) -> KB:
"""
Convert a probability table to a KB.
The probability table is a TSV file with the following columns:
1. subject ID
2. object ID
3. probability of subject SubClassOf object
4. probability of object SubClassOf subject
5. probability of subject EquivalentTo object
6. probability of subject NotInSubsumptionWith object
Each row generates four probabilistic facts, one for each relationship type.
Additionally, each unique ID is assigned to a disjoint group based on its prefix.
Args:
ptable: Path to the probability table file
name: Optional name for the KB
description: Optional description for the KB
comments: Optional comments for the KB
Returns:
A KB containing the facts from the probability table
"""
ptable_to_pfacts
def ptable_to_pfacts(ptable: str) -> Iterator[PFact]:
"""
Convert a probability table to a list of PFacts.
DEPRECATED: Use ptable_to_kb instead.
This function is maintained for backward compatibility.
Args:
ptable: Path to the probability table file
Yields:
PFact objects derived from the probability table
"""
JSON/YAML Serialization
JSON Functions
kb_to_json
def kb_to_json(kb: KB, indent: int = 2) -> str:
"""
Serialize a KB to JSON string using Pydantic's built-in serialization.
Args:
kb: The knowledge base to serialize
indent: JSON indentation level (default: 2)
Returns:
JSON string representation of the KB
"""
kb_from_json
def kb_from_json(json_str: str) -> KB:
"""
Deserialize a KB from JSON string using Pydantic's built-in deserialization.
Args:
json_str: JSON string representation of a KB
Returns:
KB instance
Raises:
ValueError: If the JSON is invalid or doesn't match KB schema
"""
YAML Functions
kb_to_yaml
def kb_to_yaml(kb: KB) -> str:
"""
Serialize a KB to YAML string.
Args:
kb: The knowledge base to serialize
Returns:
YAML string representation of the KB
Raises:
ImportError: If PyYAML is not installed
"""
kb_from_yaml
def kb_from_yaml(yaml_str: str) -> KB:
"""
Deserialize a KB from YAML string.
Args:
yaml_str: YAML string representation of a KB
Returns:
KB instance
Raises:
ImportError: If PyYAML is not installed
ValueError: If the YAML is invalid or doesn't match KB schema
"""
File I/O Functions
save_kb
def save_kb(kb: KB, file_path: Union[str, Path], format: str = "auto") -> None:
"""
Save a KB to a file in JSON or YAML format.
Args:
kb: The knowledge base to save
file_path: Path to the output file
format: Format to use ("json", "yaml", or "auto" to detect from extension)
Raises:
ValueError: If format is unsupported
ImportError: If PyYAML is required but not installed
"""
load_kb
def load_kb(file_path: Union[str, Path], format: str = "auto") -> KB:
"""
Load a KB from a JSON or YAML file.
Args:
file_path: Path to the input file
format: Format to use ("json", "yaml", or "auto" to detect from extension)
Returns:
KB instance
Raises:
FileNotFoundError: If the file doesn't exist
ValueError: If format is unsupported or file content is invalid
ImportError: If PyYAML is required but not installed
"""
Utility Functions
id_prefix
def id_prefix(id: str) -> str:
"""
Return the ID prefix of a given ID.
Args:
id: An identifier string (e.g., "MONDO:0000023")
Returns:
The prefix of the ID (e.g., "MONDO")
Raises:
ValueError: If the ID does not contain a prefix
"""
Usage Examples
Loading a KB from a Probability Table
from boomer.io import ptable_to_kb
from boomer.search import solve
# Load a knowledge base from a probability table
kb = ptable_to_kb(
"path/to/mapping.ptable.tsv",
name="Disease Mappings",
description="Mappings between MONDO and ICD10 terms"
)
# Print KB info
print(f"Loaded KB: {kb.name}")
print(f"Description: {kb.description}")
print(f"Facts: {len(kb.facts)}")
print(f"Probabilistic facts: {len(kb.pfacts)}")
# Solve the KB
solution = solve(kb)
Using the Legacy ptable_to_pfacts Function
from boomer.io import ptable_to_pfacts
from boomer.model import KB
from boomer.search import solve
# Load probabilistic facts from a probability table
pfacts = list(ptable_to_pfacts("path/to/mapping.ptable.tsv"))
# Create a KB from the facts
kb = KB(pfacts=pfacts)
# Solve the KB
solution = solve(kb)
Extracting ID Prefixes
from boomer.io import id_prefix
# Extract prefixes from IDs
mondo_prefix = id_prefix("MONDO:0000023") # Returns "MONDO"
icd_prefix = id_prefix("ICD10:K72.0") # Returns "ICD10"
# Use for disjoint group assignment
from boomer.model import MemberOfDisjointGroup
entity_id = "MONDO:0000023"
group = id_prefix(entity_id)
disjoint_fact = MemberOfDisjointGroup(sub=entity_id, group=group)
JSON/YAML Serialization Examples
Saving and Loading KBs as JSON
from boomer.io import save_kb, load_kb, kb_to_json, kb_from_json
from boomer.datasets.animals import kb
# Save to JSON file
save_kb(kb, "animals.json")
# Load from JSON file
loaded_kb = load_kb("animals.json")
# Or use string serialization
json_str = kb_to_json(kb, indent=2)
print(json_str)
# Deserialize from string
kb_restored = kb_from_json(json_str)
Working with YAML
from boomer.io import save_kb, load_kb, kb_to_yaml, kb_from_yaml
from boomer.datasets.animals import kb
# Save to YAML file (auto-detected from extension)
save_kb(kb, "animals.yaml")
# Load from YAML file
loaded_kb = load_kb("animals.yaml")
# Or use string serialization
yaml_str = kb_to_yaml(kb)
print(yaml_str)
# Deserialize from string
kb_restored = kb_from_yaml(yaml_str)
Format Auto-Detection
from boomer.io import save_kb, load_kb
from boomer.datasets.animals import kb
# Format is automatically detected from file extension
save_kb(kb, "my_kb.json") # Saves as JSON
save_kb(kb, "my_kb.yaml") # Saves as YAML
save_kb(kb, "my_kb.yml") # Also saves as YAML
# Loading also auto-detects format
kb_json = load_kb("my_kb.json")
kb_yaml = load_kb("my_kb.yaml")
# You can also specify format explicitly
save_kb(kb, "my_file.txt", format="json") # Forces JSON format
Error Handling
from boomer.io import kb_from_json, kb_from_yaml
try:
# This will raise ValueError due to invalid JSON
kb = kb_from_json("invalid json")
except ValueError as e:
print(f"JSON parsing failed: {e}")
try:
# This will raise ImportError if PyYAML is not installed
kb = kb_from_yaml("name: test")
except ImportError as e:
print(f"YAML support not available: {e}")
File Format Specifications
Probability Table (PTable) Format
The PTable format is a tab-separated values (TSV) file with 6 columns:
- Subject ID: The identifier for the subject entity
- Object ID: The identifier for the object entity
- P(SubClassOf): Probability that Subject is a subclass of Object (range: 0.0-1.0)
- P(SuperClassOf): Probability that Object is a subclass of Subject (range: 0.0-1.0)
- P(EquivalentTo): Probability that Subject and Object are equivalent (range: 0.0-1.0)
- P(DisjointWith): Probability that Subject and Object are not in a subsumption relationship (range: 0.0-1.0)
Example row:
This specifies: - Subject: ORDO:464724 - Object: MONDO:0000023 - P(ORDO:464724 SubClassOf MONDO:0000023) = 0.033... - P(MONDO:0000023 SubClassOf ORDO:464724) = 0.033... - P(ORDO:464724 EquivalentTo MONDO:0000023) = 0.9 - P(ORDO:464724 DisjointWith MONDO:0000023) = 0.033...
JSON Format
The JSON format provides a complete serialization of KB objects using Pydantic's built-in serialization. Each fact type includes a fact_type discriminator field for proper deserialization.
Example JSON structure:
{
"name": "Animals",
"description": "An ontology alignment example",
"comments": null,
"facts": [
{
"fact_type": "ProperSubClassOf",
"sub": "Felix",
"sup": "Mammalia"
},
{
"fact_type": "MemberOfDisjointGroup",
"sub": "cat",
"group": "Common"
}
],
"pfacts": [
{
"fact": {
"fact_type": "EquivalentTo",
"sub": "cat",
"equivalent": "Felix"
},
"prob": 0.9
}
],
"hypotheses": [],
"labels": {}
}
YAML Format
The YAML format provides the same information as JSON but in a more human-readable format:
name: Animals
description: An ontology alignment example
comments: null
facts:
- fact_type: ProperSubClassOf
sub: Felix
sup: Mammalia
- fact_type: MemberOfDisjointGroup
sub: cat
group: Common
pfacts:
- fact:
fact_type: EquivalentTo
sub: cat
equivalent: Felix
prob: 0.9
hypotheses: []
labels: {}
Note: YAML support requires the pyyaml package to be installed: