Skip to content

Developing a PhEval plugin

This guide explains how to develop a PhEval plugin that exposes a runner and produces PhEval standardised results that can be benchmarked consistently.

Video walkthrough

If you prefer a guided walkthrough, start here:

Write your own PhEval runner

Key takeaways

  1. A runner must implement all PhEvalRunner methods (prepare, run, post_process).
  2. Your runner must write standardised result files with the required columns for the benchmark type.
  3. Result filenames must match phenopacket filenames (file stem matching) so PhEval can align outputs to cases.

Standardised result schemas (required)

PhEval benchmarking operates on standardised result files. Each result file must conform exactly to the required schema for the type of prioritisation being produced.

Schemas are validated during post-processing.
Missing or incorrectly named columns will cause validation to fail.


Gene prioritisation results

Each gene result must contain the following columns:

Column name Type Description
gene_symbol pl.String Gene symbol
gene_identifier pl.String Gene identifier
score pl.Float64 Tool-specific score
grouping_id pl.Utf8 Optional grouping identifier

Variant prioritisation results

Each variant result must contain the following columns:

Column name Type Description
chrom pl.String Chromosome
start pl.Int64 Start position
end pl.Int64 End position
ref pl.String Reference allele
alt pl.String Alternate allele
score pl.Float64 Tool-specific score
grouping_id pl.Utf8 Optional grouping identifier

Disease prioritisation results

Each disease result must contain the following columns:

Column name Type Description
disease_identifier pl.String Disease identifier
score pl.Float64 Tool-specific score

The grouping_id column (optional but important)

grouping_id is optional and enables joint ranking of entities that should be treated as a single unit without penalty.

Typical examples include:

  • Compound heterozygous variants (multiple variants contributing together)
  • Grouped variant representations within the same gene
  • Polygenic or grouped signals where multiple items should be evaluated jointly

How to use it

  • Variants in the same group share the same grouping_id
  • Variants not in any group should each have a unique grouping_id

This preserves ranking semantics when benchmarking.


Result file naming (required)

PhEval aligns result files to cases using filename stem matching.

Rule:

The result filename stem must exactly match the phenopacket filename stem.

Example:

  • Phenopacket: patient_001.json
  • Result filename: patient_001-exomiser.json
  • Processed result filename passed to PhEval: patient_001.json

If the stems do not match, PhEval cannot reliably associate results with phenopackets, and benchmarking may be incomplete or incorrect.

Recommendation:

Always derive result filenames programmatically from the phenopacket stem.


Step-by-step plugin development

PhEval plugins are typically derived from the runner template and standardised tooling. The recommended approach uses the PhEval runner template, MkDocs, tox, and uv.

The template is available here


1. Scaffold a new plugin

Install cruft (used to create projects from the template and keep them up to date):

pip install cruft

Create a project using the template:

cruft create https://github.com/monarch-initiative/pheval-runner-template

2. Environment and dependencies

Install uv (if you do not already use it):

pip install uv

Install dependencies and activate the environment:

uv sync
source .venv/bin/activate

Run the test suite to confirm the setup:

uv run tox

Note

The template uses uv by default, but this is not required. You may use any packaging/dependency manager. PhEval only requires a valid pheval.plugins entry point.


3. Implement your custom runner

In the generated template, implement your runner in runner.py (under src/).

At minimum, implement prepare, run, and post_process:

"""Runner."""

from dataclasses import dataclass
from pathlib import Path

from pheval.runners.runner import PhEvalRunner


@dataclass
class CustomRunner(PhEvalRunner):
    """Runner class implementation."""

    input_dir: Path
    testdata_dir: Path
    tmp_dir: Path
    output_dir: Path
    config_file: Path
    version: str

    def prepare(self):
        """Prepare inputs."""
        print("preparing")

    def run(self):
        """Execute the tool."""
        print("running")

    def post_process(self):
        """Convert raw outputs to PhEval standardised results."""
        print("post processing")

4. Register the runner entry point

The template populates your pyproject.toml entry points. If you rename the runner class or move files, update this accordingly:

[project.entry-points."pheval.plugins"]
customrunner = "pheval_plugin_example.runner:CustomRunner"

Tip

The module path and class name are case-sensitive.


Tool-specific configuration (config.yaml)

For pheval run to execute, the input directory must contain a config.yaml:

tool:
tool_version:
variant_analysis:
gene_analysis:
disease_analysis:
tool_specific_configuration_options:
  • variant_analysis, gene_analysis, disease_analysis must be booleans (true / false)
  • tool_specific_configuration_options is optional and may include plugin-specific configuration

Using pydantic can simplify parsing:

from pydantic import BaseModel, Field

class CustomisedConfigurations(BaseModel):
    environment: str = Field(...)

Then parse in your runner:

config = CustomisedConfigurations.parse_obj(
    self.input_dir_config.tool_specific_configuration_options
)
environment = config.environment

Post-processing: generating standardised results

PhEval can handle ranking and writing result files in the correct locations. Your runner’s post-processing must:

  1. Read tool-specific raw outputs
  2. Extract the required fields
  3. Construct a Polars DataFrame with the required schema
  4. Call the appropriate PhEval helper method to write standardised results

Result generation helpers

Breaking change (v0.5.0)

generate_pheval_result was replaced with:

  • generate_gene_result
  • generate_variant_result
  • generate_disease_result

Generating gene result files

Use generate_gene_result to write PhEval-standardised gene results from a Polars DataFrame.

from pheval.post_processing.post_processing import (
    generate_gene_result,
    SortOrder,
)

generate_gene_result(
    results=pheval_gene_result,      # Polars DataFrame (gene schema)
    sort_order=SortOrder.DESCENDING, # or SortOrder.ASCENDING
    output_dir=output_directory,     # typically self.output_dir
    result_path=result_path,         # path to raw tool output, stem MUST match phenopacket stem exactly
    phenopacket_dir=phenopacket_dir, # directory containing phenopackets
)

Generating variant result files

Use generate_variant_result to write PhEval-standardised variant results.

from pheval.post_processing.post_processing import (
    generate_variant_result,
    SortOrder,
)

generate_variant_result(
    results=pheval_variant_result,   # Polars DataFrame (variant schema)
    sort_order=SortOrder.DESCENDING,
    output_dir=output_directory,
    result_path=result_path,         # stem must match phenopacket stem
    phenopacket_dir=phenopacket_dir,
)

Generating disease result files

Use generate_disease_result to write PhEval-standardised disease results.

from pheval.post_processing.post_processing import (
    generate_disease_result,
    SortOrder,
)

generate_disease_result(
    results=pheval_disease_result,   # Polars DataFrame (disease schema)
    sort_order=SortOrder.DESCENDING,
    output_dir=output_directory,
    result_path=result_path,         # stem must match phenopacket stem
    phenopacket_dir=phenopacket_dir,
)

Important

The stem of result_path must exactly match the phenopacket stem. This often requires stripping tool-specific suffixes from raw output filenames.


Adding metadata to results.yml (optional)

PhEval writes a results.yml file to the output directory by default. You can add customised metadata by overriding construct_meta_data().

Example dataclass:

from dataclasses import dataclass

@dataclass
class CustomisedMetaData:
    customised_field: str

Runner implementation:

def construct_meta_data(self):
    self.meta_data.tool_specific_configuration_options = CustomisedMetaData(
        customised_field="customised_value"
    )
    return self.meta_data

Helper utilities (optional)

PhEval provides helper methods that can simplify runner implementations.

PhenopacketUtil

Useful for extracting observed phenotypes when tools do not accept phenopackets directly:

Class for retrieving data from a Phenopacket or Family object

Source code in src/pheval/utils/phenopacket_utils.py
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
class PhenopacketUtil:
    """Class for retrieving data from a Phenopacket or Family object"""

    def __init__(self, phenopacket_contents: Phenopacket | Family):
        """Initialise PhenopacketUtil

        Args:
            phenopacket_contents (Union[Phenopacket, Family]): Phenopacket or Family object
        """
        self.phenopacket_contents = phenopacket_contents

    def sample_id(self) -> str:
        """
        Retrieve the sample ID from a Phenopacket or proband of a Family

        Returns:
            str: Sample ID
        """
        if hasattr(self.phenopacket_contents, "proband"):
            return self.phenopacket_contents.proband.subject.id
        else:
            return self.phenopacket_contents.subject.id

    def phenotypic_features(self) -> list[PhenotypicFeature]:
        """
        Retrieve a list of all HPO terms

        Returns:
            List[PhenotypicFeature]: List of HPO terms
        """
        if hasattr(self.phenopacket_contents, "proband"):
            return self.phenopacket_contents.proband.phenotypic_features
        else:
            return self.phenopacket_contents.phenotypic_features

    def observed_phenotypic_features(self) -> list[PhenotypicFeature]:
        """
        Retrieve a list of all observed HPO terms

        Returns:
            List[PhenotypicFeature]: List of observed HPO terms
        """
        phenotypic_features = []
        all_phenotypic_features = self.phenotypic_features()
        for p in all_phenotypic_features:
            if p.excluded:
                continue
            phenotypic_features.append(p)
        return phenotypic_features

    def negated_phenotypic_features(self) -> list[PhenotypicFeature]:
        """
        Retrieve a list of all negated HPO terms

        Returns:
            List[PhenotypicFeature]: List of negated HPO terms
        """
        negated_phenotypic_features = []
        all_phenotypic_features = self.phenotypic_features()
        for p in all_phenotypic_features:
            if p.excluded:
                negated_phenotypic_features.append(p)
        return negated_phenotypic_features

    def diseases(self) -> list[Disease]:
        """
        Retrieve a list of Diseases associated with the proband

        Returns:
            List[Disease]: List of diseases
        """
        if hasattr(self.phenopacket_contents, "proband"):
            return self.phenopacket_contents.proband.diseases
        else:
            return self.phenopacket_contents.diseases

    def _diagnosis_from_interpretations(self) -> list[ProbandDisease]:
        """
        Retrieve a list of disease diagnoses associated with the proband from the interpretations object

        Returns:
            List[ProbandDisease]: List of diagnosed diseases
        """
        diagnoses = []
        interpretation = self.interpretations()
        for i in interpretation:
            (
                diagnoses.append(
                    ProbandDisease(
                        disease_name=i.diagnosis.disease.label,
                        disease_identifier=i.diagnosis.disease.id,
                    )
                )
                if i.diagnosis.disease.label != "" and i.diagnosis.disease.id != ""
                else None
            )
        return diagnoses

    def _diagnosis_from_disease(self) -> list[ProbandDisease]:
        """
        Retrieve a list of disease diagnoses associated with the proband from the diseases object

        Returns:
            List[ProbandDisease]: List of diagnosed diseases
        """
        diagnoses = []
        for disease in self.diseases():
            diagnoses.append(ProbandDisease(disease_name=disease.term.label, disease_identifier=disease.term.id))
        return diagnoses

    def diagnoses(self) -> list[ProbandDisease]:
        """
        Retrieve a unique list of disease diagnoses associated with the proband from a Phenopacket

        Returns:
            List[ProbandDisease]: List of diagnosed diseases
        """
        return list(set(self._diagnosis_from_interpretations() + self._diagnosis_from_disease()))

    def interpretations(self) -> list[Interpretation]:
        """
        Retrieve a list of interpretations from a Phenopacket

        Returns:
            List[Interpretation]: List of interpretations
        """
        if hasattr(self.phenopacket_contents, "proband"):
            return self.phenopacket_contents.proband.interpretations
        else:
            return self.phenopacket_contents.interpretations

    def causative_variants(self) -> list[ProbandCausativeVariant]:
        """
        Retrieve a list of causative variants listed in a Phenopacket

        Returns:
            List[ProbandCausativeVariant]: List of proband causative variants
        """
        all_variants = []
        interpretation = self.interpretations()
        for i in interpretation:
            for g in i.diagnosis.genomic_interpretations:
                vcf_record = g.variant_interpretation.variation_descriptor.vcf_record
                genotype = g.variant_interpretation.variation_descriptor.allelic_state
                variant_data = ProbandCausativeVariant(
                    self.phenopacket_contents.subject.id,
                    vcf_record.genome_assembly,
                    GenomicVariant(
                        vcf_record.chrom,
                        vcf_record.pos,
                        vcf_record.ref,
                        vcf_record.alt,
                    ),
                    genotype.label,
                    vcf_record.info,
                )
                all_variants.append(variant_data)
        return all_variants

    def files(self) -> list[File]:
        """
        Retrieve a list of files associated with a phenopacket

        Returns:
            List[File]: List of files associated with a phenopacket
        """
        return self.phenopacket_contents.files

    def vcf_file_data(self, phenopacket_path: Path, vcf_dir: Path) -> File:
        """
        Retrieve the genome assembly and VCF file name from a phenopacket.

        Args:
            phenopacket_path (Path): The path to the phenopacket file.
            vcf_dir (Path): The directory path where the VCF file is stored.

        Returns:
            File: The VCF file with updated URI pointing to the specified directory.

        Raises:
            IncorrectFileFormatError: If the provided file is not in .vcf or .vcf.gz format.
            IncompatibleGenomeAssemblyError: If the genome assembly of the VCF file is not compatible.

        Note:
            This function searches for a VCF file within the provided list of files, validates its format,
            and checks if the genome assembly is compatible. If the conditions are met, it updates the
            URI of the VCF file to the specified directory and returns the modified file object.
        """
        compatible_genome_assembly = ["GRCh37", "hg19", "GRCh38", "hg38"]
        vcf_data = next(file for file in self.files() if file.file_attributes["fileFormat"] == "vcf")
        if not Path(vcf_data.uri).name.endswith(".vcf") and not Path(vcf_data.uri).name.endswith(".vcf.gz"):
            raise IncorrectFileFormatError(Path(vcf_data.uri), ".vcf or .vcf.gz file")
        if vcf_data.file_attributes["genomeAssembly"] not in compatible_genome_assembly:
            raise IncompatibleGenomeAssemblyError(vcf_data.file_attributes["genomeAssembly"], phenopacket_path)
        vcf_data.uri = str(vcf_dir.joinpath(Path(vcf_data.uri).name))
        return vcf_data

    @staticmethod
    def _extract_diagnosed_gene(
        genomic_interpretation: GenomicInterpretation,
    ) -> ProbandCausativeGene:
        """
        Retrieve the disease causing genes from the variant descriptor field if not empty,
        otherwise, retrieves from the gene descriptor from a phenopacket.
        Args:
            genomic_interpretation (GenomicInterpretation): A genomic interpretation from a Phenopacket
        Returns:
            ProbandCausativeGene: The disease causing gene
        """
        if genomic_interpretation.variant_interpretation.ByteSize() != 0:
            return ProbandCausativeGene(
                genomic_interpretation.variant_interpretation.variation_descriptor.gene_context.symbol,
                genomic_interpretation.variant_interpretation.variation_descriptor.gene_context.value_id,
            )

        else:
            return ProbandCausativeGene(
                gene_symbol=genomic_interpretation.gene.symbol,
                gene_identifier=genomic_interpretation.gene.value_id,
            )

    def diagnosed_genes(self) -> list[ProbandCausativeGene]:
        """
        Retrieve the disease causing genes from a phenopacket.
        Returns:
            List[ProbandCausativeGene]: List of causative genes
        """
        pheno_interpretation = self.interpretations()
        genes = []
        for i in pheno_interpretation:
            for g in i.diagnosis.genomic_interpretations:
                genes.append(self._extract_diagnosed_gene(g))
                genes = list({gene.gene_symbol: gene for gene in genes}.values())
        return genes

    def diagnosed_variants(self) -> list[GenomicVariant]:
        """
        Retrieve a list of all known causative variants from a phenopacket.
        Returns:
            List[GenomicVariant]: List of causative variants
        """
        variants = []
        pheno_interpretation = self.interpretations()
        for i in pheno_interpretation:
            for g in i.diagnosis.genomic_interpretations:
                variant = GenomicVariant(
                    chrom=str(g.variant_interpretation.variation_descriptor.vcf_record.chrom.replace("chr", "")),
                    pos=int(g.variant_interpretation.variation_descriptor.vcf_record.pos),
                    ref=g.variant_interpretation.variation_descriptor.vcf_record.ref,
                    alt=g.variant_interpretation.variation_descriptor.vcf_record.alt,
                )
                variants.append(variant)
        return variants

    def check_incomplete_variant_record(self) -> bool:
        """
        Check if any variant record in the phenopacket has incomplete information.

        This method iterates through the diagnosed variant records and checks if any of them
        have missing or incomplete information such as empty chromosome, position, reference,
        or alternate allele.

        Returns:
            bool: True if any variant record is incomplete, False otherwise.
        """
        variants = self.diagnosed_variants()
        for variant in variants:
            if variant.chrom == "" or variant.pos in (0, "") or variant.ref == "" or variant.alt == "":
                return True
        return False

    def check_variant_alleles(self) -> bool:
        """
        Check if any variant record in the phenopacket has identical reference and alternate alleles.

        Returns:
            bool: True if the reference and alternate alleles are identical, False otherwise.
        """
        variants = self.diagnosed_variants()
        for variant in variants:
            if variant.ref == variant.alt:
                return True
        return False

    def check_incomplete_gene_record(self) -> bool:
        """
        Check if any gene record in the phenopacket has incomplete information.

        This method iterates through the diagnosed gene records and checks if any of them
        have missing or incomplete information such as gene name, or gene identifier.

        Returns:
            bool: True if any gene record is incomplete, False otherwise.
        """
        genes = self.diagnosed_genes()
        for gene in genes:
            if gene.gene_symbol == "" or gene.gene_identifier == "":
                return True
        return False

    def check_incomplete_disease_record(self) -> bool:
        """
        Check if any disease record in the phenopacket has incomplete information.

        This method iterates through the diagnosed disease records and checks if any of them
        have missing or incomplete information such as empty disease name, or disease identifier.

        Returns:
            bool: True if any disease record is incomplete, False otherwise.
        """
        if len(self.diagnoses()) == 0:
            return True
        return False

Example usage:

from pheval.utils.phenopacket_utils import phenopacket_reader, PhenopacketUtil

phenopacket = phenopacket_reader("/path/to/phenopacket.json")
phenopacket_util = PhenopacketUtil(phenopacket)

observed_phenotypes = phenopacket_util.observed_phenotypic_features()
observed_phenotypes_hpo_ids = [p.type.id for p in observed_phenotypes]

Testing your runner

Install dependencies:

uv sync

Run PhEval using your custom runner:

pheval run -i ./input_dir -t ./test_data_dir -r customrunner -o output_dir

Notes:

  • the -r/--runner value must match the entry point name (lowercase)
  • confirm that standardised result files are produced and validate correctly
  • confirm that result file stems match the phenopacket file stems

Checklist before release

  • Runner implements prepare, run, post_process
  • Entry point registered under pheval.plugins
  • Standardised results conform to required schema(s)
  • Result filenames use phenopacket stem matching
  • Optional: grouping_id correctly set for grouped ranking scenarios
  • Optional: results.yml metadata populated where useful