Developing a PhEval plugin

This guide explains how to develop a PhEval plugin that exposes a runner and produces PhEval standardised results that can be benchmarked consistently.

Video walkthrough

If you prefer a guided walkthrough, start here:

Write your own PhEval runner

Key takeaways

A runner must implement all PhEvalRunner methods (prepare, run, post_process).

Your runner must write standardised result files with the required columns for the benchmark type.

Result filenames must match phenopacket filenames (file stem matching) so PhEval can align outputs to cases.

Standardised result schemas (required)

PhEval benchmarking operates on standardised result files. Each result file must conform exactly to the required schema for the type of prioritisation being produced.

Schemas are validated during post-processing.
Missing or incorrectly named columns will cause validation to fail.

Gene prioritisation results

Each gene result must contain the following columns:

Column name	Type	Description
`gene_symbol`	`pl.String`	Gene symbol
`gene_identifier`	`pl.String`	Gene identifier
`score`	`pl.Float64`	Tool-specific score
`grouping_id`	`pl.Utf8`	Optional grouping identifier

Variant prioritisation results

Each variant result must contain the following columns:

Column name	Type	Description
`chrom`	`pl.String`	Chromosome
`start`	`pl.Int64`	Start position
`end`	`pl.Int64`	End position
`ref`	`pl.String`	Reference allele
`alt`	`pl.String`	Alternate allele
`score`	`pl.Float64`	Tool-specific score
`grouping_id`	`pl.Utf8`	Optional grouping identifier

Disease prioritisation results

Each disease result must contain the following columns:

Column name	Type	Description
`disease_identifier`	`pl.String`	Disease identifier
`score`	`pl.Float64`	Tool-specific score

The `grouping_id` column (optional but important)

grouping_id is optional and enables joint ranking of entities that should be treated as a single unit without penalty.

Typical examples include:

Compound heterozygous variants (multiple variants contributing together)
Grouped variant representations within the same gene
Polygenic or grouped signals where multiple items should be evaluated jointly

How to use it

Variants in the same group share the same grouping_id
Variants not in any group should each have a unique grouping_id

This preserves ranking semantics when benchmarking.

Result file naming (required)

PhEval aligns result files to cases using filename stem matching.

Rule:

The result filename stem must exactly match the phenopacket filename stem.

Example:

Phenopacket: patient_001.json
Result filename: patient_001-exomiser.json
Processed result filename passed to PhEval: patient_001.json

If the stems do not match, PhEval cannot reliably associate results with phenopackets, and benchmarking may be incomplete or incorrect.

Recommendation:

Always derive result filenames programmatically from the phenopacket stem.

Step-by-step plugin development

PhEval plugins are typically derived from the runner template and standardised tooling. The recommended approach uses the PhEval runner template, MkDocs, tox, and uv.

The template is available here

1. Scaffold a new plugin

Install cruft (used to create projects from the template and keep them up to date):

pip install cruft

Create a project using the template:

cruft create https://github.com/monarch-initiative/pheval-runner-template

2. Environment and dependencies

Install uv (if you do not already use it):

pip install uv

Install dependencies and activate the environment:

uv sync
source .venv/bin/activate

Run the test suite to confirm the setup:

uv run tox

Note

The template uses uv by default, but this is not required. You may use any packaging/dependency manager. PhEval only requires a valid pheval.plugins entry point.

3. Implement your custom runner

In the generated template, implement your runner in runner.py (under src/).

At minimum, implement prepare, run, and post_process:

"""Runner."""

from dataclasses import dataclass
from pathlib import Path

from pheval.runners.runner import PhEvalRunner


@dataclass
class CustomRunner(PhEvalRunner):
    """Runner class implementation."""

    input_dir: Path
    testdata_dir: Path
    tmp_dir: Path
    output_dir: Path
    config_file: Path
    version: str

    def prepare(self):
        """Prepare inputs."""
        print("preparing")

    def run(self):
        """Execute the tool."""
        print("running")

    def post_process(self):
        """Convert raw outputs to PhEval standardised results."""
        print("post processing")

4. Register the runner entry point

The template populates your pyproject.toml entry points. If you rename the runner class or move files, update this accordingly:

[project.entry-points."pheval.plugins"]
customrunner = "pheval_plugin_example.runner:CustomRunner"

Tip

The module path and class name are case-sensitive.

Tool-specific configuration (config.yaml)

For pheval run to execute, the input directory must contain a config.yaml:

tool:
tool_version:
variant_analysis:
gene_analysis:
disease_analysis:
tool_specific_configuration_options:

variant_analysis, gene_analysis, disease_analysis must be booleans (true / false)
tool_specific_configuration_options is optional and may include plugin-specific configuration

Parsing tool-specific configuration (recommended)

Using pydantic can simplify parsing:

from pydantic import BaseModel, Field

class CustomisedConfigurations(BaseModel):
    environment: str = Field(...)

Then parse in your runner:

config = CustomisedConfigurations.parse_obj(
    self.input_dir_config.tool_specific_configuration_options
)
environment = config.environment

Post-processing: generating standardised results

PhEval can handle ranking and writing result files in the correct locations. Your runner’s post-processing must:

Read tool-specific raw outputs
Extract the required fields
Construct a Polars DataFrame with the required schema
Call the appropriate PhEval helper method to write standardised results

Result generation helpers

Breaking change (v0.5.0)

generate_pheval_result was replaced with:

generate_gene_result
generate_variant_result
generate_disease_result

Generating gene result files

Use generate_gene_result to write PhEval-standardised gene results from a Polars DataFrame.

from pheval.post_processing.post_processing import (
    generate_gene_result,
    SortOrder,
)

generate_gene_result(
    results=pheval_gene_result,      # Polars DataFrame (gene schema)
    sort_order=SortOrder.DESCENDING, # or SortOrder.ASCENDING
    output_dir=output_directory,     # typically self.output_dir
    result_path=result_path,         # path to raw tool output, stem MUST match phenopacket stem exactly
    phenopacket_dir=phenopacket_dir, # directory containing phenopackets
)

Generating variant result files

Use generate_variant_result to write PhEval-standardised variant results.

from pheval.post_processing.post_processing import (
    generate_variant_result,
    SortOrder,
)

generate_variant_result(
    results=pheval_variant_result,   # Polars DataFrame (variant schema)
    sort_order=SortOrder.DESCENDING,
    output_dir=output_directory,
    result_path=result_path,         # stem must match phenopacket stem
    phenopacket_dir=phenopacket_dir,
)

Generating disease result files

Use generate_disease_result to write PhEval-standardised disease results.

from pheval.post_processing.post_processing import (
    generate_disease_result,
    SortOrder,
)

generate_disease_result(
    results=pheval_disease_result,   # Polars DataFrame (disease schema)
    sort_order=SortOrder.DESCENDING,
    output_dir=output_directory,
    result_path=result_path,         # stem must match phenopacket stem
    phenopacket_dir=phenopacket_dir,
)

Important

The stem of result_path must exactly match the phenopacket stem. This often requires stripping tool-specific suffixes from raw output filenames.

Adding metadata to results.yml (optional)

PhEval writes a results.yml file to the output directory by default. You can add customised metadata by overriding construct_meta_data().

Example dataclass:

from dataclasses import dataclass

@dataclass
class CustomisedMetaData:
    customised_field: str

Runner implementation:

def construct_meta_data(self):
    self.meta_data.tool_specific_configuration_options = CustomisedMetaData(
        customised_field="customised_value"
    )
    return self.meta_data

Helper utilities (optional)

PhEval provides helper methods that can simplify runner implementations.

PhenopacketUtil

Useful for extracting observed phenotypes when tools do not accept phenopackets directly:

Class for retrieving data from a Phenopacket or Family object

Source code in src/pheval/utils/phenopacket_utils.py

class PhenopacketUtil:
    """Class for retrieving data from a Phenopacket or Family object"""

    def __init__(self, phenopacket_contents: Phenopacket | Family):
        """Initialise PhenopacketUtil

        Args:
            phenopacket_contents (Union[Phenopacket, Family]): Phenopacket or Family object
        """
        self.phenopacket_contents = phenopacket_contents

    def sample_id(self) -> str:
        """
        Retrieve the sample ID from a Phenopacket or proband of a Family

        Returns:
            str: Sample ID
        """
        if hasattr(self.phenopacket_contents, "proband"):
            return self.phenopacket_contents.proband.subject.id
        else:
            return self.phenopacket_contents.subject.id

    def phenotypic_features(self) -> list[PhenotypicFeature]:
        """
        Retrieve a list of all HPO terms

        Returns:
            List[PhenotypicFeature]: List of HPO terms
        """
        if hasattr(self.phenopacket_contents, "proband"):
            return self.phenopacket_contents.proband.phenotypic_features
        else:
            return self.phenopacket_contents.phenotypic_features

    def observed_phenotypic_features(self) -> list[PhenotypicFeature]:
        """
        Retrieve a list of all observed HPO terms

        Returns:
            List[PhenotypicFeature]: List of observed HPO terms
        """
        phenotypic_features = []
        all_phenotypic_features = self.phenotypic_features()
        for p in all_phenotypic_features:
            if p.excluded:
                continue
            phenotypic_features.append(p)
        return phenotypic_features

    def negated_phenotypic_features(self) -> list[PhenotypicFeature]:
        """
        Retrieve a list of all negated HPO terms

        Returns:
            List[PhenotypicFeature]: List of negated HPO terms
        """
        negated_phenotypic_features = []
        all_phenotypic_features = self.phenotypic_features()
        for p in all_phenotypic_features:
            if p.excluded:
                negated_phenotypic_features.append(p)
        return negated_phenotypic_features

    def diseases(self) -> list[Disease]:
        """
        Retrieve a list of Diseases associated with the proband

        Returns:
            List[Disease]: List of diseases
        """
        if hasattr(self.phenopacket_contents, "proband"):
            return self.phenopacket_contents.proband.diseases
        else:
            return self.phenopacket_contents.diseases

    def _diagnosis_from_interpretations(self) -> list[ProbandDisease]:
        """
        Retrieve a list of disease diagnoses associated with the proband from the interpretations object

        Returns:
            List[ProbandDisease]: List of diagnosed diseases
        """
        diagnoses = []
        interpretation = self.interpretations()
        for i in interpretation:
            (
                diagnoses.append(
                    ProbandDisease(
                        disease_name=i.diagnosis.disease.label,
                        disease_identifier=i.diagnosis.disease.id,
                    )
                )
                if i.diagnosis.disease.label != "" and i.diagnosis.disease.id != ""
                else None
            )
        return diagnoses

    def _diagnosis_from_disease(self) -> list[ProbandDisease]:
        """
        Retrieve a list of disease diagnoses associated with the proband from the diseases object

        Returns:
            List[ProbandDisease]: List of diagnosed diseases
        """
        diagnoses = []
        for disease in self.diseases():
            diagnoses.append(ProbandDisease(disease_name=disease.term.label, disease_identifier=disease.term.id))
        return diagnoses

    def diagnoses(self) -> list[ProbandDisease]:
        """
        Retrieve a unique list of disease diagnoses associated with the proband from a Phenopacket

        Returns:
            List[ProbandDisease]: List of diagnosed diseases
        """
        return list(set(self._diagnosis_from_interpretations() + self._diagnosis_from_disease()))

    def interpretations(self) -> list[Interpretation]:
        """
        Retrieve a list of interpretations from a Phenopacket

        Returns:
            List[Interpretation]: List of interpretations
        """
        if hasattr(self.phenopacket_contents, "proband"):
            return self.phenopacket_contents.proband.interpretations
        else:
            return self.phenopacket_contents.interpretations

    def causative_variants(self) -> list[ProbandCausativeVariant]:
        """
        Retrieve a list of causative variants listed in a Phenopacket

        Returns:
            List[ProbandCausativeVariant]: List of proband causative variants
        """
        all_variants = []
        interpretation = self.interpretations()
        for i in interpretation:
            for g in i.diagnosis.genomic_interpretations:
                vcf_record = g.variant_interpretation.variation_descriptor.vcf_record
                genotype = g.variant_interpretation.variation_descriptor.allelic_state
                variant_data = ProbandCausativeVariant(
                    self.phenopacket_contents.subject.id,
                    vcf_record.genome_assembly,
                    GenomicVariant(
                        vcf_record.chrom,
                        vcf_record.pos,
                        vcf_record.ref,
                        vcf_record.alt,
                    ),
                    genotype.label,
                    vcf_record.info,
                )
                all_variants.append(variant_data)
        return all_variants

    def files(self) -> list[File]:
        """
        Retrieve a list of files associated with a phenopacket

        Returns:
            List[File]: List of files associated with a phenopacket
        """
        return self.phenopacket_contents.files

    def vcf_file_data(self, phenopacket_path: Path, vcf_dir: Path) -> File:
        """
        Retrieve the genome assembly and VCF file name from a phenopacket.

        Args:
            phenopacket_path (Path): The path to the phenopacket file.
            vcf_dir (Path): The directory path where the VCF file is stored.

        Returns:
            File: The VCF file with updated URI pointing to the specified directory.

        Raises:
            IncorrectFileFormatError: If the provided file is not in .vcf or .vcf.gz format.
            IncompatibleGenomeAssemblyError: If the genome assembly of the VCF file is not compatible.

        Note:
            This function searches for a VCF file within the provided list of files, validates its format,
            and checks if the genome assembly is compatible. If the conditions are met, it updates the
            URI of the VCF file to the specified directory and returns the modified file object.
        """
        compatible_genome_assembly = ["GRCh37", "hg19", "GRCh38", "hg38"]
        vcf_data = next(file for file in self.files() if file.file_attributes["fileFormat"] == "vcf")
        if not Path(vcf_data.uri).name.endswith(".vcf") and not Path(vcf_data.uri).name.endswith(".vcf.gz"):
            raise IncorrectFileFormatError(Path(vcf_data.uri), ".vcf or .vcf.gz file")
        if vcf_data.file_attributes["genomeAssembly"] not in compatible_genome_assembly:
            raise IncompatibleGenomeAssemblyError(vcf_data.file_attributes["genomeAssembly"], phenopacket_path)
        vcf_data.uri = str(vcf_dir.joinpath(Path(vcf_data.uri).name))
        return vcf_data

    @staticmethod
    def _extract_diagnosed_gene(
        genomic_interpretation: GenomicInterpretation,
    ) -> ProbandCausativeGene:
        """
        Retrieve the disease causing genes from the variant descriptor field if not empty,
        otherwise, retrieves from the gene descriptor from a phenopacket.
        Args:
            genomic_interpretation (GenomicInterpretation): A genomic interpretation from a Phenopacket
        Returns:
            ProbandCausativeGene: The disease causing gene
        """
        if genomic_interpretation.variant_interpretation.ByteSize() != 0:
            return ProbandCausativeGene(
                genomic_interpretation.variant_interpretation.variation_descriptor.gene_context.symbol,
                genomic_interpretation.variant_interpretation.variation_descriptor.gene_context.value_id,
            )

        else:
            return ProbandCausativeGene(
                gene_symbol=genomic_interpretation.gene.symbol,
                gene_identifier=genomic_interpretation.gene.value_id,
            )

    def diagnosed_genes(self) -> list[ProbandCausativeGene]:
        """
        Retrieve the disease causing genes from a phenopacket.
        Returns:
            List[ProbandCausativeGene]: List of causative genes
        """
        pheno_interpretation = self.interpretations()
        genes = []
        for i in pheno_interpretation:
            for g in i.diagnosis.genomic_interpretations:
                genes.append(self._extract_diagnosed_gene(g))
                genes = list({gene.gene_symbol: gene for gene in genes}.values())
        return genes

    def diagnosed_variants(self) -> list[GenomicVariant]:
        """
        Retrieve a list of all known causative variants from a phenopacket.
        Returns:
            List[GenomicVariant]: List of causative variants
        """
        variants = []
        pheno_interpretation = self.interpretations()
        for i in pheno_interpretation:
            for g in i.diagnosis.genomic_interpretations:
                variant = GenomicVariant(
                    chrom=str(g.variant_interpretation.variation_descriptor.vcf_record.chrom.replace("chr", "")),
                    pos=int(g.variant_interpretation.variation_descriptor.vcf_record.pos),
                    ref=g.variant_interpretation.variation_descriptor.vcf_record.ref,
                    alt=g.variant_interpretation.variation_descriptor.vcf_record.alt,
                )
                variants.append(variant)
        return variants

    def check_incomplete_variant_record(self) -> bool:
        """
        Check if any variant record in the phenopacket has incomplete information.

        This method iterates through the diagnosed variant records and checks if any of them
        have missing or incomplete information such as empty chromosome, position, reference,
        or alternate allele.

        Returns:
            bool: True if any variant record is incomplete, False otherwise.
        """
        variants = self.diagnosed_variants()
        for variant in variants:
            if variant.chrom == "" or variant.pos in (0, "") or variant.ref == "" or variant.alt == "":
                return True
        return False

    def check_variant_alleles(self) -> bool:
        """
        Check if any variant record in the phenopacket has identical reference and alternate alleles.

        Returns:
            bool: True if the reference and alternate alleles are identical, False otherwise.
        """
        variants = self.diagnosed_variants()
        for variant in variants:
            if variant.ref == variant.alt:
                return True
        return False

    def check_incomplete_gene_record(self) -> bool:
        """
        Check if any gene record in the phenopacket has incomplete information.

        This method iterates through the diagnosed gene records and checks if any of them
        have missing or incomplete information such as gene name, or gene identifier.

        Returns:
            bool: True if any gene record is incomplete, False otherwise.
        """
        genes = self.diagnosed_genes()
        for gene in genes:
            if gene.gene_symbol == "" or gene.gene_identifier == "":
                return True
        return False

    def check_incomplete_disease_record(self) -> bool:
        """
        Check if any disease record in the phenopacket has incomplete information.

        This method iterates through the diagnosed disease records and checks if any of them
        have missing or incomplete information such as empty disease name, or disease identifier.

        Returns:
            bool: True if any disease record is incomplete, False otherwise.
        """
        if len(self.diagnoses()) == 0:
            return True
        return False

Example usage:

from pheval.utils.phenopacket_utils import phenopacket_reader, PhenopacketUtil

phenopacket = phenopacket_reader("/path/to/phenopacket.json")
phenopacket_util = PhenopacketUtil(phenopacket)

observed_phenotypes = phenopacket_util.observed_phenotypic_features()
observed_phenotypes_hpo_ids = [p.type.id for p in observed_phenotypes]

Testing your runner

Install dependencies:

uv sync

Run PhEval using your custom runner:

pheval run -i ./input_dir -t ./test_data_dir -r customrunner -o output_dir

Notes:

the -r/--runner value must match the entry point name (lowercase)
confirm that standardised result files are produced and validate correctly
confirm that result file stems match the phenopacket file stems

Checklist before release

Runner implements prepare, run, post_process
Entry point registered under pheval.plugins
Standardised results conform to required schema(s)
Result filenames use phenopacket stem matching
Optional: grouping_id correctly set for grouped ranking scenarios
Optional: results.yml metadata populated where useful