koza

Details


GitHub	monarch-initiative/koza
Language	Python
Description	Data transformation framework for LinkML data models

Dependencies

External Dependencies

Package	Version
python	^3.10
duckdb	*
loguru	*
ordered-set	>=4.1.0
pydantic	^2.4
pyyaml	>=5.0.0
requests	^2.24.0
sssom	>=0.4
typer	>=0.12.3
mergedeep	1.3.4
tqdm	^4.67.1
coverage	^7.6.10

Documentation

Koza - a data transformation framework

pupa

Documentation

Disclaimer: Koza is in beta - we are looking for testers!

Overview

Transform csv, json, yaml, jsonl, and xml and converting them to a target csv, json, or jsonl format based on your dataclass model.
Koza also can output data in the KGX format
Write data transforms in semi-declarative Python
Configure source files, expected columns/json properties and path filters, field filters, and metadata in yaml
Create or import mapping files to be used in ingests (eg id mapping, type mappings)
Create and use translation tables to map between source and target vocabularies

Installation

Koza is available on PyPi and can be installed via pip/pipx:

[pip|pipx] install koza

Usage

NOTE: As of version 0.2.0, there is a new method for getting your ingest's KozaApp instance. Please see the updated documentation for details.

See the Koza documentation for usage information

Try the Examples

Validate

Give Koza a local or remote csv file, and get some basic information (headers, number of rows)

koza validate \
  --file https://raw.githubusercontent.com/monarch-initiative/koza/main/examples/data/string.tsv \
  --delimiter ' '

Sending a json or jsonl formatted file will confirm if the file is valid json or jsonl

koza validate \
  --file ./examples/data/ZFIN_PHENOTYPE_0.jsonl.gz \
  --format jsonl

koza validate \
  --file ./examples/data/ddpheno.json.gz \
  --format json

Transform

Run the example ingest, "string/protein-links-detailed"

koza transform \
  --source examples/string/protein-links-detailed.yaml \
  --global-table examples/translation_table.yaml

koza transform \
  --source examples/string-declarative/protein-links-detailed.yaml \
  --global-table examples/translation_table.yaml

Note: Koza expects a directory structure as described in the above example
with the source config file and transform code in the same directory:

.
├── ...
│   ├── your_source
│   │   ├── your_ingest.yaml
│   │   └── your_ingest.py
│   └── some_translation_table.yaml
└── ...