Skip to content

Work plan

The goal of this pilot project is to create a Python package that will simplify the encoding of oncology clinical data using the GA4GH Phenopakcet Schema <https://phenopacket-schema.readthedocs.io/en/latest/>_.

The pyphetools <https://github.com/monarch-initiative/pyphetools>_ project has a comparable code base targeted a rare disease.

This pilot project will use the API of the CDA project <https://github.com/CancerDataAggregator/cda-python.git> to access Cancer Data Aggregator <https://datacommons.cancer.gov/cancer-data-aggregator> resources, and output patient data using the Phenopacket Schema. The C2P code can then be extended to ingest data from other sources.

GitHub project board

Let's use this project board to keep track of issues. The board is connected to the two repositories

Work items

The first phase of work will be to provide and test ETL code to transform CDA data into collections of phenopackets.

Let's use the following table to keep track of our status.

CDA ETL class Oncoexporter Class Status
CdaIndividualFactory OpIndividual done, needs unit tests