Phenotype scrambling utilities

This page documents utilities used to introduce noise or perturbations into phenotype data. These commands are typically used to assess the robustness and sensitivity of phenotype-driven prioritisation methods.

They operate on existing phenotype data and do not execute tools or perform benchmarking directly.

Purpose

Phenotype scrambling utilities are used to:

Simulate noisy or incomplete phenotypic observations
Evaluate how sensitive methods are to phenotype quality
Test robustness under controlled perturbations

These experiments are useful when comparing tools, parameterisations, or ontology versions.

Scrambling phenopackets

The scramble-phenopackets command generates perturbed versions of existing phenopackets.

The scrambled phenopackets can then be used as inputs to runners for execution and benchmarking.

Example: scramble a phenopacket corpus

Generate scrambled phenopackets from an existing corpus:

pheval-utils scramble-phenopackets \
  --phenopacket-dir phenopackets/ \
  --output-dir scrambled_phenopackets/ \
  --scramble-factor 0.7 \
  --local-ontology-cache ./hp.obo

Notes:

The original phenopackets are not modified.

Scrambled outputs are written to a separate directory.

The resulting phenopackets can be used directly with plugin-provided runners.

How phenotype scrambling fits into a workflow

A typical robustness experiment using phenotype scrambling might look like:

Prepare a clean phenopacket corpus
Generate scrambled phenopackets
Run tools via plugin-provided runners using pheval run
Benchmark and compare performance against the original results

Scrambling utilities are optional and primarily used in experimental or methodological evaluations.