Benchmarking and analysis

This section describes how PhEval is used to benchmark and compare phenotype-driven prioritisation methods once tool execution has completed.

Benchmarking in PhEval is designed to support controlled, reproducible evaluation across:

This section focuses on analysis, not execution. Tools are executed via runners provided by plugins.

What benchmarking means in PhEval

In PhEval, benchmarking refers to the process of:

Benchmarking operates over one or more completed runs and assumes that tool execution has already taken place.

A typical benchmarking workflow consists of:

Execute one or more runners
Runners produce PhEval-standardised outputs for gene, variant, and/or disease prioritisation.
Configure benchmarking parameters
A YAML configuration file specifies which runs to include and how benchmarking should be performed.
Run benchmarking and analysis
PhEval utilities compute metrics, comparisons, and plots across the specified runs.

Each of these steps is described in more detail in the following pages.

Benchmarking generates:

These outputs support both exploratory analysis and formal evaluation.