koza
Details
GitHub | monarch-initiative/koza |
Language | Python |
Description | Data transformation framework for LinkML data models |
Dependencies
External Dependencies
Package | Version |
---|---|
python | ^3.9 |
duckdb | * |
linkml | >=1.7.8 |
loguru | * |
ordered-set | >=4.1.0 |
pydantic | ^2.4 |
pyyaml | >=5.0.0 |
requests | ^2.24.0 |
sssom | >=0.4 |
typer | >=0.12.3 |
Documentation
Koza - a data transformation framework
Disclaimer: Koza is in beta - we are looking for testers!
Overview
- Transform csv, json, yaml, jsonl, and xml and converting them to a target csv, json, or jsonl format based on your dataclass model.
- Koza also can output data in the KGX format
- Write data transforms in semi-declarative Python
- Configure source files, expected columns/json properties and path filters, field filters, and metadata in yaml
- Create or import mapping files to be used in ingests (eg id mapping, type mappings)
- Create and use translation tables to map between source and target vocabularies
Installation
Koza is available on PyPi and can be installed via pip/pipx:
[pip|pipx] install koza
Usage
NOTE: As of version 0.2.0, there is a new method for getting your ingest's KozaApp
instance. Please see the updated documentation for details.
See the Koza documentation for usage information
Try the Examples
Validate
Give Koza a local or remote csv file, and get some basic information (headers, number of rows)
koza validate \
--file https://raw.githubusercontent.com/monarch-initiative/koza/main/examples/data/string.tsv \
--delimiter ' '
Sending a json or jsonl formatted file will confirm if the file is valid json or jsonl
koza validate \
--file ./examples/data/ZFIN_PHENOTYPE_0.jsonl.gz \
--format jsonl
koza validate \
--file ./examples/data/ddpheno.json.gz \
--format json
Transform
Run the example ingest, "string/protein-links-detailed"
koza transform \
--source examples/string/protein-links-detailed.yaml \
--global-table examples/translation_table.yaml
koza transform \
--source examples/string-declarative/protein-links-detailed.yaml \
--global-table examples/translation_table.yaml
Note:
Koza expects a directory structure as described in the above example
with the source config file and transform code in the same directory:
.
├── ...
│ ├── your_source
│ │ ├── your_ingest.yaml
│ │ └── your_ingest.py
│ └── some_translation_table.yaml
└── ...