Data Formats
BOOMER supports multiple data formats for knowledge bases, making it easy to work with different data sources and integrate with various workflows.
Supported Formats
Input Formats
- PTable (
.ptable.tsv,.tsv) - Tab-separated probability tables - JSON (
.json) - JSON serialization of Pydantic models - YAML (
.yaml,.yml) - YAML serialization of Pydantic models - Python (
py) - Python modules with akbattribute
Output Formats
- Markdown - Human-readable reports (default)
- TSV - Tab-separated values for downstream processing
- JSON - Machine-readable JSON
- YAML - Machine-readable YAML
Future Support
- OWL - Web Ontology Language support is planned for future releases
Format Details
PTable Format
Probability tables are tab-separated files that compactly represent probabilistic facts:
# Example: disease_mappings.ptable.tsv
EquivalentTo MONDO:0000023 ICD10:K72.0 0.8
EquivalentTo MONDO:0000023 ICD10:K72.1 0.3
ProperSubClassOf MONDO:0000023 Disease 1.0
DisjointWith ICD10:K72.0 ICD10:K72.1 1.0
Format specification: - Column 1: Fact type (EquivalentTo, ProperSubClassOf, DisjointWith, etc.) - Column 2+: Arguments for the fact - Last column: Probability (0.0 to 1.0)
YAML Format
YAML provides a human-readable structured format for knowledge bases:
# Example: animals.yaml
name: Animal Taxonomy
description: Mapping common animal names to scientific names
# Deterministic facts (probability = 1.0)
facts:
- fact_type: ProperSubClassOf
sub: Felix
sup: Mammalia
- fact_type: ProperSubClassOf
sub: Canus
sup: Mammalia
- fact_type: MemberOfDisjointGroup
sub: cat
group: Common
- fact_type: MemberOfDisjointGroup
sub: dog
group: Common
- fact_type: MemberOfDisjointGroup
sub: Felix
group: Formal
- fact_type: MemberOfDisjointGroup
sub: Canus
group: Formal
# Probabilistic facts
pfacts:
- fact:
fact_type: EquivalentTo
sub: cat
equivalent: Felix
prob: 0.9
- fact:
fact_type: EquivalentTo
sub: dog
equivalent: Canus
prob: 0.9
- fact:
fact_type: EquivalentTo
sub: cat
equivalent: Canus
prob: 0.1 # Low probability - cats are not dogs!
JSON Format
JSON provides machine-readable structured format, identical structure to YAML:
{
"name": "Family Relationships",
"description": "Modeling family relationship types",
"facts": [
{
"fact_type": "ProperSubClassOf",
"sub": "Child",
"sup": "Person"
},
{
"fact_type": "ProperSubClassOf",
"sub": "Parent",
"sup": "Person"
},
{
"fact_type": "DisjointWith",
"sub": "Mother",
"disjoint_with": "Father"
}
],
"pfacts": [
{
"fact": {
"fact_type": "EquivalentTo",
"sub": "Mom",
"equivalent": "Mother"
},
"prob": 0.95
},
{
"fact": {
"fact_type": "EquivalentTo",
"sub": "Dad",
"equivalent": "Father"
},
"prob": 0.95
}
]
}
Python Module Format
Python modules can define knowledge bases programmatically:
# my_kb.py
from boomer.model import KB, PFact, EquivalentTo, ProperSubClassOf
kb = KB(
name="My Knowledge Base",
description="Custom KB defined in Python",
facts=[
ProperSubClassOf("A", "B"),
ProperSubClassOf("B", "C"),
],
pfacts=[
PFact(fact=EquivalentTo("X", "Y"), prob=0.8),
PFact(fact=EquivalentTo("X", "Z"), prob=0.3),
]
)
# Can be loaded with:
# boomer-cli solve my_kb.py
# boomer-cli solve my_kb::kb
Output Formats
TSV Output
The TSV output format is designed for easy processing by downstream tools:
# SSSOM-style metadata header
# name: Solution for Animal Taxonomy
# confidence: 0.9234
# combinations_explored: 1024
# satisfiable_combinations: 512
# time_elapsed: 0.234
# Tab-separated data
fact_type arg1 arg2 truth_value prior_prob posterior_prob
EquivalentTo cat Felix True 0.9 0.95
EquivalentTo dog Canus True 0.9 0.94
EquivalentTo cat Canus False 0.1 0.05
JSON Solution Output
{
"confidence": 0.9234,
"prior_prob": 0.81,
"posterior_prob": 0.95,
"number_of_combinations": 1024,
"number_of_satisfiable_combinations": 512,
"time_elapsed": 0.234,
"solved_pfacts": [
{
"pfact": {
"fact": {
"fact_type": "EquivalentTo",
"sub": "cat",
"equivalent": "Felix"
},
"prob": 0.9
},
"truth_value": true,
"posterior_prob": 0.95
}
]
}
Format Conversion
BOOMER provides easy conversion between formats:
# Convert PTable to YAML
boomer-cli convert input.ptable.tsv -o output.yaml
# Convert JSON to YAML
boomer-cli convert kb.json -o kb.yaml
# Convert Python module to JSON
boomer-cli convert boomer.datasets.animals -o animals.json
# Add metadata during conversion
boomer-cli convert input.tsv -o output.yaml \
--name "My KB" \
--description "Converted knowledge base"
Python API for Conversion
from boomer.io import load_kb, save_kb
# Load from any format (auto-detected)
kb = load_kb('input.ptable.tsv')
kb = load_kb('data.json')
kb = load_kb('data.yaml')
# Save to any format
save_kb(kb, 'output.json', format='json')
save_kb(kb, 'output.yaml', format='yaml')
# Note: PTable output not yet supported
# save_kb(kb, 'output.tsv', format='ptable') # Not implemented
Grid Search Configuration
Grid search uses YAML or JSON to specify parameter combinations:
# grid_config.yaml
configurations:
- {} # Default configuration
configuration_matrix:
max_pfacts_per_clique: [100, 150, 200]
max_candidate_solutions: [100, 200]
timeout_seconds: [2, 10]
pr_filter: [0.2, 0.4, 0.6, 0.8]
Usage:
Format Selection
BOOMER automatically detects formats based on file extensions, but you can explicitly specify formats when needed:
# Auto-detection (recommended)
boomer-cli solve data.json
boomer-cli solve data.yaml
boomer-cli solve data.ptable.tsv
# Explicit format specification
boomer-cli solve data -f json
boomer-cli solve data -f yaml
boomer-cli solve data.tsv -f ptable
# Output format specification
boomer-cli solve input.json -O yaml -o solution.yaml
boomer-cli solve input.yaml -O tsv -o solution.tsv
Best Practices
- Use YAML/JSON for complex KBs - When you have metadata, multiple fact types, or need version control
- Use PTable for simple mappings - When you have straightforward probabilistic mappings
- Use Python modules for dynamic KBs - When you need to generate facts programmatically
- Use TSV output for analysis - Easy to import into spreadsheets or data analysis tools
- Use JSON/YAML output for integration - Machine-readable formats for downstream processing
Future: OWL Support
Future releases will support OWL (Web Ontology Language) for: - Importing existing OWL ontologies - Exporting BOOMER solutions to OWL format - Integration with standard ontology tools - Support for OWL axioms and annotations
This will enable BOOMER to work seamlessly with existing semantic web infrastructure and ontology management systems.