Variant utilities

This page documents utilities used to work with variant-level data in PhEval workflows. These commands are primarily used to construct or manipulate VCF inputs for variant-based evaluation experiments.

They do not execute tools directly and do not perform benchmarking.

Purpose

Variant utilities are used to:

Generate VCFs containing known ("spiked") variants
Support controlled variant-based evaluation experiments

These utilities are typically used in conjunction with phenopacket data and plugin-provided runners.

Creating spiked VCFs

The create-spiked-vcfs command is used to generate VCF files containing known causal variants derived from phenopackets.

This is particularly useful when:

Evaluating variant-based prioritisation methods
Simulating realistic diagnostic scenarios
Benchmarking tools that require both phenotypes and variants

The command supports both single phenopackets and directories of phenopackets.

Example: create spiked VCFs from a phenopacket directory (hg38)

pheval-utils create-spiked-vcfs \
  --phenopacket-dir phenopackets/ \
  --hg38-template-vcf hg38_template.vcf \
  --output-dir spiked_vcfs/

Example: create a spiked VCF from a single phenopacket (hg19)

pheval-utils create-spiked-vcfs \
  --phenopacket-path case_001.json \
  --hg19-template-vcf hg19_template.vcf \
  --output-dir spiked_vcf/

Example: use a directory of VCF templates

Instead of a single template file, a directory of VCF templates can be provided:

pheval-utils create-spiked-vcfs \
  --phenopacket-dir phenopackets/ \
  --hg38-vcf-dir hg38_vcf_templates/ \
  --output-dir spiked_vcfs/

Notes and constraints:

Exactly one of --phenopacket-path or --phenopacket-dir must be provided.

For each genome build, either a template VCF file or a directory of template VCFs must be supplied.

The generated VCFs are written to the specified output directory.

Spiked VCFs are typically consumed by runners that support variant-based analysis.

How variant utilities fit into a workflow

A typical variant-based evaluation workflow might look like:

Prepare and normalise phenopackets
Generate spiked VCFs using variant utilities
Run tools via plugin-provided runners using pheval run
Benchmark and analyse variant-level results