Data Formats
BOOMER supports multiple data formats for knowledge bases, making it easy to work with different data sources and integrate with various workflows.
Supported Formats
Input Formats
- PTable (
.ptable.tsv,.tsv) - Tab-separated probability tables - JSON (
.json) - JSON serialization of Pydantic models - YAML (
.yaml,.yml) - YAML serialization of Pydantic models - Python (
py) - Python modules with akbattribute - OBO (
.obo) - OBO format ontologies - OWL (
.owl,.owx,.ofn) - OWL ontologies (via py-horned-owl) - SSSOM (
.sssom.tsv) - SSSOM mapping set TSV files
Output Formats
- Markdown - Human-readable reports (default)
- TSV - Tab-separated values for downstream processing
- JSON - Machine-readable JSON
- YAML - Machine-readable YAML
Format Details
PTable Format
Probability tables are tab-separated files that compactly represent probabilistic facts:
# Example: disease_mappings.ptable.tsv
EquivalentTo MONDO:0000023 ICD10:K72.0 0.8
EquivalentTo MONDO:0000023 ICD10:K72.1 0.3
ProperSubClassOf MONDO:0000023 Disease 1.0
DisjointWith ICD10:K72.0 ICD10:K72.1 1.0
Format specification: - Column 1: Fact type (EquivalentTo, ProperSubClassOf, DisjointWith, etc.) - Column 2+: Arguments for the fact - Last column: Probability (0.0 to 1.0)
YAML Format
YAML provides a human-readable structured format for knowledge bases:
# Example: animals.yaml
name: Animal Taxonomy
description: Mapping common animal names to scientific names
# Deterministic facts (probability = 1.0)
facts:
- fact_type: ProperSubClassOf
sub: Felix
sup: Mammalia
- fact_type: ProperSubClassOf
sub: Canus
sup: Mammalia
- fact_type: MemberOfDisjointGroup
sub: cat
group: Common
- fact_type: MemberOfDisjointGroup
sub: dog
group: Common
- fact_type: MemberOfDisjointGroup
sub: Felix
group: Formal
- fact_type: MemberOfDisjointGroup
sub: Canus
group: Formal
# Probabilistic facts
pfacts:
- fact:
fact_type: EquivalentTo
sub: cat
equivalent: Felix
prob: 0.9
- fact:
fact_type: EquivalentTo
sub: dog
equivalent: Canus
prob: 0.9
- fact:
fact_type: EquivalentTo
sub: cat
equivalent: Canus
prob: 0.1 # Low probability - cats are not dogs!
JSON Format
JSON provides machine-readable structured format, identical structure to YAML:
{
"name": "Family Relationships",
"description": "Modeling family relationship types",
"facts": [
{
"fact_type": "ProperSubClassOf",
"sub": "Child",
"sup": "Person"
},
{
"fact_type": "ProperSubClassOf",
"sub": "Parent",
"sup": "Person"
},
{
"fact_type": "DisjointWith",
"sub": "Mother",
"disjoint_with": "Father"
}
],
"pfacts": [
{
"fact": {
"fact_type": "EquivalentTo",
"sub": "Mom",
"equivalent": "Mother"
},
"prob": 0.95
},
{
"fact": {
"fact_type": "EquivalentTo",
"sub": "Dad",
"equivalent": "Father"
},
"prob": 0.95
}
]
}
Python Module Format
Python modules can define knowledge bases programmatically:
# my_kb.py
from boomer.model import KB, PFact, EquivalentTo, ProperSubClassOf
kb = KB(
name="My Knowledge Base",
description="Custom KB defined in Python",
facts=[
ProperSubClassOf("A", "B"),
ProperSubClassOf("B", "C"),
],
pfacts=[
PFact(fact=EquivalentTo("X", "Y"), prob=0.8),
PFact(fact=EquivalentTo("X", "Z"), prob=0.3),
]
)
# Can be loaded with:
# boomer-cli solve my_kb.py
# boomer-cli solve my_kb::kb
Output Formats
TSV Output
The TSV output format is designed for easy processing by downstream tools:
# SSSOM-style metadata header
# name: Solution for Animal Taxonomy
# confidence: 0.9234
# combinations_explored: 1024
# satisfiable_combinations: 512
# time_elapsed: 0.234
# Tab-separated data
fact_type arg1 arg2 truth_value prior_prob posterior_prob
EquivalentTo cat Felix True 0.9 0.95
EquivalentTo dog Canus True 0.9 0.94
EquivalentTo cat Canus False 0.1 0.05
JSON Solution Output
{
"confidence": 0.9234,
"prior_prob": 0.81,
"posterior_prob": 0.95,
"number_of_combinations": 1024,
"number_of_satisfiable_combinations": 512,
"time_elapsed": 0.234,
"solved_pfacts": [
{
"pfact": {
"fact": {
"fact_type": "EquivalentTo",
"sub": "cat",
"equivalent": "Felix"
},
"prob": 0.9
},
"truth_value": true,
"posterior_prob": 0.95
}
]
}
Format Conversion
BOOMER provides easy conversion between formats:
# Convert PTable to YAML
boomer-cli convert input.ptable.tsv -o output.yaml
# Convert JSON to YAML
boomer-cli convert kb.json -o kb.yaml
# Convert Python module to JSON
boomer-cli convert boomer.datasets.animals -o animals.json
# Add metadata during conversion
boomer-cli convert input.tsv -o output.yaml \
--name "My KB" \
--description "Converted knowledge base"
Python API for Conversion
from boomer.io import load_kb, save_kb
# Load from any format (auto-detected)
kb = load_kb('input.ptable.tsv')
kb = load_kb('data.json')
kb = load_kb('data.yaml')
# Save to any format
save_kb(kb, 'output.json', format='json')
save_kb(kb, 'output.yaml', format='yaml')
# Note: PTable output not yet supported
# save_kb(kb, 'output.tsv', format='ptable') # Not implemented
Grid Search Configuration
Grid search uses YAML or JSON to specify parameter combinations:
# grid_config.yaml
configurations:
- {} # Default configuration
configuration_matrix:
max_pfacts_per_clique: [100, 150, 200]
max_candidate_solutions: [100, 200]
timeout_seconds: [2, 10]
pr_filter: [0.2, 0.4, 0.6, 0.8]
Usage:
Format Selection
BOOMER automatically detects formats based on file extensions, but you can explicitly specify formats when needed:
# Auto-detection (recommended)
boomer-cli solve data.json
boomer-cli solve data.yaml
boomer-cli solve data.ptable.tsv
# Explicit format specification
boomer-cli solve data -f json
boomer-cli solve data -f yaml
boomer-cli solve data.tsv -f ptable
# Output format specification
boomer-cli solve input.json -O yaml -o solution.yaml
boomer-cli solve input.yaml -O tsv -o solution.tsv
Best Practices
- Use YAML/JSON for complex KBs - When you have metadata, multiple fact types, or need version control
- Use PTable for simple mappings - When you have straightforward probabilistic mappings
- Use Python modules for dynamic KBs - When you need to generate facts programmatically
- Use TSV output for analysis - Easy to import into spreadsheets or data analysis tools
- Use JSON/YAML output for integration - Machine-readable formats for downstream processing
Ontology Formats (OBO and OWL)
BOOMER can directly import OBO and OWL ontology files, extracting structural axioms as hard facts and cross-references/SKOS mappings as probabilistic facts.
What Gets Extracted
| Ontology Axiom | KB Fact Type | Probability |
|---|---|---|
is_a / SubClassOf |
ProperSubClassOf |
1.0 (hard fact) |
equivalent_to / EquivalentClasses |
EquivalentTo |
1.0 (hard fact) |
disjoint_from / DisjointClasses |
DisjointWith |
1.0 (hard fact) |
xref / oboInOwl:hasDbXref |
EquivalentTo |
configurable (default 0.7) |
skos:exactMatch |
EquivalentTo |
configurable (default 0.9) |
skos:closeMatch |
EquivalentTo |
configurable (default 0.7) |
skos:broadMatch |
ProperSubClassOf (reversed) |
configurable (default 0.7) |
skos:narrowMatch |
ProperSubClassOf |
configurable (default 0.7) |
Additionally, MemberOfDisjointGroup facts are auto-generated per ID prefix, so entities from different namespaces are treated as members of disjoint groups.
CLI Usage
# Convert OBO ontology to YAML KB
pyboomer convert my_ontology.obo -o kb.yaml
# Convert OWL ontology to JSON KB
pyboomer convert my_ontology.owl -o kb.json
# Solve directly from an ontology
pyboomer solve my_ontology.obo -O markdown
# Extract a cluster around a seed entity
pyboomer extract my_ontology.obo --id MONDO:0001234 -o cluster.yaml
Python API
from boomer.ontology_converter import obo_to_kb, owl_to_kb, ontology_to_kb
# Parse OBO file
kb = obo_to_kb("my_ontology.obo")
# Parse OWL file (functional syntax, OWL/XML, etc.)
kb = owl_to_kb("my_ontology.ofn")
# Auto-dispatch by extension
kb = ontology_to_kb("my_ontology.obo") # detects OBO
kb = ontology_to_kb("my_ontology.owl") # detects OWL
Configuration
You can customize conversion behavior with OntologyConverterConfig:
from boomer.ontology_converter import OntologyConverterConfig, obo_to_kb
config = OntologyConverterConfig(
xref_default_probability=0.5,
xref_prefix_probabilities={"OMIM": 0.9, "ICD10": 0.6},
skos_exact_match_prob=0.95,
skip_obsolete=True,
include_xrefs=True,
include_skos=True,
auto_disjoint_groups=True,
)
kb = obo_to_kb("my_ontology.obo", config=config)
Or load config from a YAML file:
# ontology_config.yaml
xref_default_probability: 0.5
xref_prefix_probabilities:
OMIM: 0.9
ICD10: 0.6
skos_exact_match_prob: 0.95
skip_obsolete: true
from boomer.ontology_converter import load_ontology_config, obo_to_kb
config = load_ontology_config("ontology_config.yaml")
kb = obo_to_kb("my_ontology.obo", config=config)
Supported OWL Serializations
The OWL backend uses py-horned-owl and supports:
- OWL Functional Syntax (
.ofn) - OWL/XML (
.owx) - RDF/OWL (
.owl)
SSSOM Format
SSSOM (Simple Standard for Sharing Ontological Mappings) TSV files can be imported as boomer KBs. Each mapping row becomes a probabilistic fact, with the SKOS predicate determining the fact type.
CLI Usage
# Convert SSSOM to YAML KB
pyboomer convert mappings.sssom.tsv -o kb.yaml
# Solve directly from SSSOM
pyboomer solve mappings.sssom.tsv -f sssom -O markdown
Python API
See the SSSOM converter documentation for configuration options including per-prefix probabilities and predicate mapping customization.