Skip to content

Developing a PhEval Plugin

Description

Plugin development allows PhEval to be extensible, as we have designed it. The plugin goal is to be flexible through custom runner implementations. This plugin development enhances the PhEval functionality. You can build one quickly using this step-by-step process.

All custom Runners implementations must implement all PhevalRunner methods

Bases: ABC

PhEvalRunner Class

Source code in src/pheval/runners/runner.py
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
@dataclass
class PhEvalRunner(ABC):
    """PhEvalRunner Class"""

    input_dir: Path
    testdata_dir: Path
    tmp_dir: Path
    output_dir: Path
    config_file: Path
    version: str
    directory_path = None
    input_dir_config = None
    _meta_data = None
    __raw_results_dir = "raw_results/"
    __pheval_gene_results_dir = "pheval_gene_results/"
    __pheval_variant_results_dir = "pheval_variant_results/"
    __pheval_disease_results_dir = "pheval_disease_results/"
    __tool_input_commands_dir = "tool_input_commands/"
    __run_meta_data_file = "results.yml"

    def __post_init__(self):
        self.input_dir_config = parse_input_dir_config(self.input_dir)

    def _get_tool(self):
        return self.input_dir_config.tool

    def _get_variant_analysis(self):
        return self.input_dir_config.variant_analysis

    def _get_gene_analysis(self):
        return self.input_dir_config.gene_analysis

    def _get_disease_analysis(self):
        return self.input_dir_config.disease_analysis

    @property
    def tool_input_commands_dir(self):
        return Path(self.output_dir).joinpath(self.__tool_input_commands_dir)

    @tool_input_commands_dir.setter
    def tool_input_commands_dir(self, directory_path):
        self.directory_path = Path(directory_path)

    @property
    def raw_results_dir(self):
        return Path(self.output_dir).joinpath(self.__raw_results_dir)

    @raw_results_dir.setter
    def raw_results_dir(self, directory_path):
        self.directory_path = Path(directory_path)

    @property
    def pheval_gene_results_dir(self):
        return Path(self.output_dir).joinpath(self.__pheval_gene_results_dir)

    @pheval_gene_results_dir.setter
    def pheval_gene_results_dir(self, directory_path):
        self.directory_path = Path(directory_path)

    @property
    def pheval_variant_results_dir(self):
        return Path(self.output_dir).joinpath(self.__pheval_variant_results_dir)

    @pheval_variant_results_dir.setter
    def pheval_variant_results_dir(self, directory_path):
        self.directory_path = Path(directory_path)

    @property
    def pheval_disease_results_dir(self):
        return Path(self.output_dir).joinpath(self.__pheval_disease_results_dir)

    @pheval_disease_results_dir.setter
    def pheval_disease_results_dir(self, directory_path):
        self.directory_path = Path(directory_path)

    def build_output_directory_structure(self):
        """build output directory structure"""
        self.tool_input_commands_dir.mkdir(exist_ok=True)
        self.raw_results_dir.mkdir(exist_ok=True)
        if self._get_variant_analysis():
            self.pheval_variant_results_dir.mkdir(exist_ok=True)
        if self._get_gene_analysis():
            self.pheval_gene_results_dir.mkdir(exist_ok=True)
        if self._get_disease_analysis():
            self.pheval_disease_results_dir.mkdir(exist_ok=True)

    @property
    def meta_data(self):
        self._meta_data = BasicOutputRunMetaData(
            tool=self.input_dir_config.tool,
            tool_version=self.version,
            config=f"{Path(self.input_dir).parent.name}/{Path(self.input_dir).name}",
            run_timestamp=datetime.now().timestamp(),
            corpus=f"{Path(self.testdata_dir).parent.name}/{Path(self.testdata_dir).name}",
        )
        return self._meta_data

    @meta_data.setter
    def meta_data(self, meta_data):
        self._meta_data = meta_data

    @abstractmethod
    def prepare(self) -> str:
        """prepare"""

    @abstractmethod
    def run(self):
        """run"""

    @abstractmethod
    def post_process(self):
        """post_process"""

    def construct_meta_data(self):
        """Construct run output meta data"""
        return self.meta_data

Step-by-Step Plugin Development Process

The plugin structure is derived from a cookiecutter template, Sphintoxetry-cookiecutter, and it uses Sphinx, tox and poetry as core dependencies. This allows PhEval extensibility to be standardized in terms of documentation and dependency management.

1. Sphintoxetry-cookiecutter scaffold

First, install the cruft package. Cruft enables keeping projects up-to-date with future updates made to this original template.

Install the latest release of cruft from pip

pip install cruft

NOTE: You may encounter an error with the naming of the project layout if using an older release of cruft. To avoid this, make sure you have installed the latest release version.

Next, create a project using the sphintoxetry-cookiecutter template.

cruft create https://github.com/monarch-initiative/monarch-project-template

2. Further setup

Install poetry if you haven't already.

pip install poetry

Install dependencies

poetry install

Add PhEval dependency

poetry add pheval

Run tox to see if the setup works

poetry run tox

3. Implement PhEval Custom Runner

The runner name is arbitrary and custom Runner name was chose by demonstrative purposes

Create a runner file inside the plugin project, e.g:

a

"""Custom Pheval Runner."""
from dataclasses import dataclass
from pathlib import Path
from pheval.runners.runner import PhEvalRunner


@dataclass
class CustomPhevalRunner(PhEvalRunner):
    """CustomPhevalRunner Class."""

    input_dir: Path
    testdata_dir: Path
    tmp_dir: Path
    output_dir: Path
    config_file: Path
    version: str

    def prepare(self):
        """prepare method."""
        print("preparing")

    def run(self):
        """run method."""
        print("running with custom pheval runner")

    def post_process(self):
        """post_process method."""
        print("post processing")

4. Add PhEval Plugins section to the pyproject.toml file

[tool.poetry.plugins."pheval.plugins"]
customrunner = "pheval_plugin_example.runner:CustomPhevalRunner"

Replace the value above with the path to your custom runner plugin

5. Implementing PhEval helper methods

Streamlining the creation of your custom PhEval runner can be facilitated by leveraging PhEval's versatile helper methods, where applicable.

Within PhEval, numerous public methods have been designed to assist in your runner methods. The utilisation of these helper methods is optional, yet they are crafted to enhance the overall implementation process.

Utility methods

The PhenopacketUtil class is designed to aid in the collection of specific data from a Phenopacket.

Class for retrieving data from a Phenopacket or Family object

Source code in src/pheval/utils/phenopacket_utils.py
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
class PhenopacketUtil:
    """Class for retrieving data from a Phenopacket or Family object"""

    def __init__(self, phenopacket_contents: Union[Phenopacket, Family]):
        """Initialise PhenopacketUtil

        Args:
            phenopacket_contents (Union[Phenopacket, Family]): Phenopacket or Family object
        """
        self.phenopacket_contents = phenopacket_contents

    def sample_id(self) -> str:
        """
        Retrieve the sample ID from a Phenopacket or proband of a Family

        Returns:
            str: Sample ID
        """
        if hasattr(self.phenopacket_contents, "proband"):
            return self.phenopacket_contents.proband.subject.id
        else:
            return self.phenopacket_contents.subject.id

    def phenotypic_features(self) -> List[PhenotypicFeature]:
        """
        Retrieve a list of all HPO terms

        Returns:
            List[PhenotypicFeature]: List of HPO terms
        """
        if hasattr(self.phenopacket_contents, "proband"):
            return self.phenopacket_contents.proband.phenotypic_features
        else:
            return self.phenopacket_contents.phenotypic_features

    def observed_phenotypic_features(self) -> List[PhenotypicFeature]:
        """
        Retrieve a list of all observed HPO terms

        Returns:
            List[PhenotypicFeature]: List of observed HPO terms
        """
        phenotypic_features = []
        all_phenotypic_features = self.phenotypic_features()
        for p in all_phenotypic_features:
            if p.excluded:
                continue
            phenotypic_features.append(p)
        return phenotypic_features

    def negated_phenotypic_features(self) -> List[PhenotypicFeature]:
        """
        Retrieve a list of all negated HPO terms

        Returns:
            List[PhenotypicFeature]: List of negated HPO terms
        """
        negated_phenotypic_features = []
        all_phenotypic_features = self.phenotypic_features()
        for p in all_phenotypic_features:
            if p.excluded:
                negated_phenotypic_features.append(p)
        return negated_phenotypic_features

    def diseases(self) -> List[Disease]:
        """
        Retrieve a list of Diseases associated with the proband

        Returns:
            List[Disease]: List of diseases
        """
        if hasattr(self.phenopacket_contents, "proband"):
            return self.phenopacket_contents.proband.diseases
        else:
            return self.phenopacket_contents.diseases

    def _diagnosis_from_interpretations(self) -> List[ProbandDisease]:
        """
        Retrieve a list of disease diagnoses associated with the proband from the interpretations object

        Returns:
            List[ProbandDisease]: List of diagnosed diseases
        """
        diagnoses = []
        interpretation = self.interpretations()
        for i in interpretation:
            (
                diagnoses.append(
                    ProbandDisease(
                        disease_name=i.diagnosis.disease.label,
                        disease_identifier=i.diagnosis.disease.id,
                    )
                )
                if i.diagnosis.disease.label != "" and i.diagnosis.disease.id != ""
                else None
            )
        return diagnoses

    def _diagnosis_from_disease(self) -> List[ProbandDisease]:
        """
        Retrieve a list of disease diagnoses associated with the proband from the diseases object

        Returns:
            List[ProbandDisease]: List of diagnosed diseases
        """
        diagnoses = []
        for disease in self.diseases():
            diagnoses.append(
                ProbandDisease(disease_name=disease.term.label, disease_identifier=disease.term.id)
            )
        return diagnoses

    def diagnoses(self) -> List[ProbandDisease]:
        """
        Retrieve a unique list of disease diagnoses associated with the proband from a Phenopacket

        Returns:
            List[ProbandDisease]: List of diagnosed diseases
        """
        return list(set(self._diagnosis_from_interpretations() + self._diagnosis_from_disease()))

    def interpretations(self) -> List[Interpretation]:
        """
        Retrieve a list of interpretations from a Phenopacket

        Returns:
            List[Interpretation]: List of interpretations
        """
        if hasattr(self.phenopacket_contents, "proband"):
            return self.phenopacket_contents.proband.interpretations
        else:
            return self.phenopacket_contents.interpretations

    def causative_variants(self) -> List[ProbandCausativeVariant]:
        """
        Retrieve a list of causative variants listed in a Phenopacket

        Returns:
            List[ProbandCausativeVariant]: List of proband causative variants
        """
        all_variants = []
        interpretation = self.interpretations()
        for i in interpretation:
            for g in i.diagnosis.genomic_interpretations:
                vcf_record = g.variant_interpretation.variation_descriptor.vcf_record
                genotype = g.variant_interpretation.variation_descriptor.allelic_state
                variant_data = ProbandCausativeVariant(
                    self.phenopacket_contents.subject.id,
                    vcf_record.genome_assembly,
                    GenomicVariant(
                        vcf_record.chrom,
                        vcf_record.pos,
                        vcf_record.ref,
                        vcf_record.alt,
                    ),
                    genotype.label,
                    vcf_record.info,
                )
                all_variants.append(variant_data)
        return all_variants

    def files(self) -> List[File]:
        """
        Retrieve a list of files associated with a phenopacket

        Returns:
            List[File]: List of files associated with a phenopacket
        """
        return self.phenopacket_contents.files

    def vcf_file_data(self, phenopacket_path: Path, vcf_dir: Path) -> File:
        """
        Retrieve the genome assembly and VCF file name from a phenopacket.

        Args:
            phenopacket_path (Path): The path to the phenopacket file.
            vcf_dir (Path): The directory path where the VCF file is stored.

        Returns:
            File: The VCF file with updated URI pointing to the specified directory.

        Raises:
            IncorrectFileFormatError: If the provided file is not in .vcf or .vcf.gz format.
            IncompatibleGenomeAssemblyError: If the genome assembly of the VCF file is not compatible.

        Note:
            This function searches for a VCF file within the provided list of files, validates its format,
            and checks if the genome assembly is compatible. If the conditions are met, it updates the
            URI of the VCF file to the specified directory and returns the modified file object.
        """
        compatible_genome_assembly = ["GRCh37", "hg19", "GRCh38", "hg38"]
        vcf_data = [file for file in self.files() if file.file_attributes["fileFormat"] == "vcf"][0]
        if not Path(vcf_data.uri).name.endswith(".vcf") and not Path(vcf_data.uri).name.endswith(
            ".vcf.gz"
        ):
            raise IncorrectFileFormatError(Path(vcf_data.uri), ".vcf or .vcf.gz file")
        if vcf_data.file_attributes["genomeAssembly"] not in compatible_genome_assembly:
            raise IncompatibleGenomeAssemblyError(
                vcf_data.file_attributes["genomeAssembly"], phenopacket_path
            )
        vcf_data.uri = str(vcf_dir.joinpath(Path(vcf_data.uri).name))
        return vcf_data

    @staticmethod
    def _extract_diagnosed_gene(
        genomic_interpretation: GenomicInterpretation,
    ) -> ProbandCausativeGene:
        """
        Retrieve the disease causing genes from the variant descriptor field if not empty,
        otherwise, retrieves from the gene descriptor from a phenopacket.
        Args:
            genomic_interpretation (GenomicInterpretation): A genomic interpretation from a Phenopacket
        Returns:
            ProbandCausativeGene: The disease causing gene
        """
        if genomic_interpretation.variant_interpretation.ByteSize() != 0:
            return ProbandCausativeGene(
                genomic_interpretation.variant_interpretation.variation_descriptor.gene_context.symbol,
                genomic_interpretation.variant_interpretation.variation_descriptor.gene_context.value_id,
            )

        else:
            return ProbandCausativeGene(
                gene_symbol=genomic_interpretation.gene.symbol,
                gene_identifier=genomic_interpretation.gene.value_id,
            )

    def diagnosed_genes(self) -> List[ProbandCausativeGene]:
        """
        Retrieve the disease causing genes from a phenopacket.
        Returns:
            List[ProbandCausativeGene]: List of causative genes
        """
        pheno_interpretation = self.interpretations()
        genes = []
        for i in pheno_interpretation:
            for g in i.diagnosis.genomic_interpretations:
                genes.append(self._extract_diagnosed_gene(g))
                genes = list({gene.gene_symbol: gene for gene in genes}.values())
        return genes

    def diagnosed_variants(self) -> List[GenomicVariant]:
        """
        Retrieve a list of all known causative variants from a phenopacket.
        Returns:
            List[GenomicVariant]: List of causative variants
        """
        variants = []
        pheno_interpretation = self.interpretations()
        for i in pheno_interpretation:
            for g in i.diagnosis.genomic_interpretations:
                variant = GenomicVariant(
                    chrom=g.variant_interpretation.variation_descriptor.vcf_record.chrom.replace(
                        "chr", ""
                    ),
                    pos=g.variant_interpretation.variation_descriptor.vcf_record.pos,
                    ref=g.variant_interpretation.variation_descriptor.vcf_record.ref,
                    alt=g.variant_interpretation.variation_descriptor.vcf_record.alt,
                )
                variants.append(variant)
        return variants

    def check_incomplete_variant_record(self) -> bool:
        """
        Check if any variant record in the phenopacket has incomplete information.

        This method iterates through the diagnosed variant records and checks if any of them
        have missing or incomplete information such as empty chromosome, position, reference,
        or alternate allele.

        Returns:
            bool: True if any variant record is incomplete, False otherwise.
        """
        variants = self.diagnosed_variants()
        for variant in variants:
            if (
                variant.chrom == ""
                or variant.pos == 0
                or variant.pos == ""
                or variant.ref == ""
                or variant.alt == ""
            ):
                return True
        return False

    def check_incomplete_gene_record(self) -> bool:
        """
        Check if any gene record in the phenopacket has incomplete information.

        This method iterates through the diagnosed gene records and checks if any of them
        have missing or incomplete information such as gene name, or gene identifier.

        Returns:
            bool: True if any gene record is incomplete, False otherwise.
        """
        genes = self.diagnosed_genes()
        for gene in genes:
            if gene.gene_symbol == "" or gene.gene_identifier == "":
                return True
        return False

    def check_incomplete_disease_record(self) -> bool:
        """
        Check if any disease record in the phenopacket has incomplete information.

        This method iterates through the diagnosed disease records and checks if any of them
        have missing or incomplete information such as empty disease name, or disease identifier.

        Returns:
            bool: True if any disease record is incomplete, False otherwise.
        """
        if len(self.diagnoses()) == 0:
            return True
        return False

PhenopacketUtil proves particularly beneficial in scenarios where the tool for which you're crafting a runner implementation does not directly accept Phenopackets as inputs. Instead, it might require elements—such as HPO IDs— via the command-line interface (CLI). In this context, leveraging PhenopacketUtil within the runner's preparation phase enables the extraction of observed phenotypic features from the Phenopacket input, facilitating seamless processing.

An example of how this could be implemented is outlined here:

from pheval.utils.phenopacket_utils import phenopacket_reader
from pheval.utils.phenopacket_utils import PhenopacketUtil

phenopacket = phenopacket_reader("/path/to/phenopacket.json")
phenopacket_util = PhenopacketUtil(phenopacket)
# To return a list of all observed phenotypes for a phenopacket
observed_phenotypes = phenopacket_util.observed_phenotypic_features()
# To extract just the HPO ID as a list
observed_phenotypes_hpo_ids = [
    observed_phenotype.id for observed_phenotype in observed_phenotypes
]

Additional tool-specific configurations

For the pheval run command to execute successfully, a config.yaml should be found within the input directory supplied on the CLI.

tool: 
tool_version: 
variant_analysis: 
gene_analysis: 
disease_analysis: 
tool_specific_configuration_options:

The tool_specific_configuration_options is an optional field that can be populated with any variables specific to your runner implementation that is required for the running of your tool.

All other fields are required to be filled in. The variant_analysis, gene_analysis, and disease_analysis are set as booleans and are for specifying what type of analysis/prioritisation the tool outputs.

To populate the tool_specific_configurations_options with customised data, we suggest using the pydantic package as it can easily parse the data from the yaml structure.

e.g.,

Define a BaseModel class with the fields that will populate the tool_specific_configuration_options

from pydantic import BaseModel, Field

class CustomisedConfigurations(BaseModel):
    """
    Class for defining the customised configurations in tool_specific_configurations field,
    within the input_dir config.yaml
    Args:
        environment (str): Environment to run
    """
    environment: str = Field(...)

Within your runner parse the field into an object.

from dataclasses import dataclass
from pheval.runners.runner import PhEvalRunner
from pathlib import Path

@dataclass
class CustomPhevalRunner(PhEvalRunner):
    """CustomPhevalRunner Class."""

    input_dir: Path
    testdata_dir: Path
    tmp_dir: Path
    output_dir: Path
    config_file: Path
    version: str

    def prepare(self):
        """prepare method."""
        print("preparing")
        config = CustomisedConfigurations.parse_obj(
            self.input_dir_config.tool_specific_configuration_options
        )
        environment = config.environment

    def run(self):
        """run method."""
        print("running with custom pheval runner")

    def post_process(self):
        """post_process method."""
        print("post processing")

Post-processing methods

PhEval currently supports the benchmarking of gene, variant, and disease prioritisation results.

To benchmark these result types, PhEval TSV result files need to be generated.

PhEval can deal with the ranking and generation of these files to the correct location. However, the runner implementation must handle the extraction of essential data from the tool-specific raw results. This involves transforming them into a list comprising PhEval data classes, with each instance representing a result entry.

The dataclasses representing essential information extracted from tool-specific output for gene, variant, and disease prioritisation are defined as follows:

Bases: PhEvalResult

Minimal data required from tool-specific output for gene prioritisation result

Parameters:

Name Type Description Default
gene_symbol str

The gene symbol for the result entry

required
gene_identifier str

The ENSEMBL gene identifier for the result entry

required
score float

The score for the gene result entry

required
Notes

While we recommend providing the gene identifier in the ENSEMBL namespace, any matching format used in Phenopacket interpretations is acceptable for result matching purposes in the analysis.

Source code in src/pheval/post_processing/post_processing.py
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
@dataclass
class PhEvalGeneResult(PhEvalResult):
    """Minimal data required from tool-specific output for gene prioritisation result
    Args:
        gene_symbol (str): The gene symbol for the result entry
        gene_identifier (str): The ENSEMBL gene identifier for the result entry
        score (float): The score for the gene result entry
    Notes:
        While we recommend providing the gene identifier in the ENSEMBL namespace,
        any matching format used in Phenopacket interpretations is acceptable for result matching purposes
        in the analysis.
    """

    gene_symbol: str
    gene_identifier: str
    score: float

Bases: PhEvalResult

Minimal data required from tool-specific output for variant prioritisation

Parameters:

Name Type Description Default
chromosome str

The chromosome position of the variant recommended to be provided in the following format.

required
start int

The start position of the variant

required
end int

The end position of the variant

required
ref str

The reference allele of the variant

required
alt str

The alternate allele of the variant

required
score float

The score for the variant result entry

required
Notes

While we recommend providing the variant's chromosome in the specified format, any matching format used in Phenopacket interpretations is acceptable for result matching purposes in the analysis.

Source code in src/pheval/post_processing/post_processing.py
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
@dataclass
class PhEvalVariantResult(PhEvalResult):
    """Minimal data required from tool-specific output for variant prioritisation
    Args:
        chromosome (str): The chromosome position of the variant recommended to be provided in the following format.
        This includes numerical designations from 1 to 22 representing autosomal chromosomes,
        as well as the sex chromosomes X and Y, and the mitochondrial chromosome MT.
        start (int): The start position of the variant
        end (int): The end position of the variant
        ref (str): The reference allele of the variant
        alt (str): The alternate allele of the variant
        score (float): The score for the variant result entry
    Notes:
        While we recommend providing the variant's chromosome in the specified format,
        any matching format used in Phenopacket interpretations is acceptable for result matching purposes
        in the analysis.
    """

    chromosome: str
    start: int
    end: int
    ref: str
    alt: str
    score: float

Bases: PhEvalResult

Minimal data required from tool-specific output for disease prioritisation

Parameters:

Name Type Description Default
disease_name str

Disease name for the result entry

required
disease_identifier str

Identifier for the disease result entry in the OMIM namespace

required
score str

Score for the disease result entry

required
Notes

While we recommend providing the disease identifier in the OMIM namespace, any matching format used in Phenopacket interpretations is acceptable for result matching purposes in the analysis.

Source code in src/pheval/post_processing/post_processing.py
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
@dataclass
class PhEvalDiseaseResult(PhEvalResult):
    """Minimal data required from tool-specific output for disease prioritisation
    Args:
        disease_name (str): Disease name for the result entry
        disease_identifier (str): Identifier for the disease result entry in the OMIM namespace
        score (str): Score for the disease result entry
    Notes:
        While we recommend providing the disease identifier in the OMIM namespace,
        any matching format used in Phenopacket interpretations is acceptable for result matching purposes
        in the analysis.
    """

    disease_name: str
    disease_identifier: str
    score: float

The generate_pheval_result() can be implemented in your runner to write out the PhEval TSV results.

An example of how the method can be called is outlined here:

from pheval.post_processing.post_processing import generate_pheval_result

generate_pheval_result(
    pheval_result=pheval_gene_result, # this is the list of extracted PhEval result requirements
    sort_order_str="descending", # or can be ascending - this determines in which order the scores will be ranked
    output_dir=output_directory, # this can be accessed from the runner instance e.g., self.output_dir
    tool_result_path=tool_result_json # this is the path to the tool-specific raw results file
)

Adding metadata to the results.yml

By default, PhEval will write a results.yml to the output directory supplied on the CLI.

The results.yml contains basic metadata regarding the run configuration, however, there is also the option to add customised run metadata to the results.yml in the tool_specific_configuration_options field.

To achieve this, you'll need to create a construct_meta_data() method within your runner implementation. This method is responsible for appending customised metadata to the metadata object in the form of a defined dataclass. It should return the entire metadata object once the addition is completed.

e.g.,

Defined customised metadata dataclass:

from dataclasses import dataclass

@dataclass
class CustomisedMetaData:
    customised_field: str

Example of implementation in the runner.

from dataclasses import dataclass
from pheval.runners.runner import PhEvalRunner
from pathlib import Path

@dataclass
class CustomPhevalRunner(PhEvalRunner):
    """CustomPhevalRunner Class."""

    input_dir: Path
    testdata_dir: Path
    tmp_dir: Path
    output_dir: Path
    config_file: Path
    version: str

    def prepare(self):
        """prepare method."""
        print("preparing")

    def run(self):
        """run method."""
        print("running with custom pheval runner")

    def post_process(self):
        """post_process method."""
        print("post processing")

    def construct_meta_data(self):
        """Add metadata."""
        self.meta_data.tool_specific_configuration_options = CustomisedMetaData(customised_field="customised_value")
        return self.meta_data

6. Test it.

To update your custom pheval runner implementation, you must first install the package

poetry install

Now you have to be able to run PhEval passing your custom runner as parameter. e.g.,

pheval run -i ./input_dir -t ./test_data_dir -r 'customphevalrunner' -o output_dir

The -r parameter stands for your plugin runner class name, and it must be entirely lowercase.

Output:

preparing
running with custom pheval Runner
post processing

Pay attention to "running with custom pheval Runner" line, this is exactly what we had implemented in the CustomPhevalRunner Example