Skip to content

Work plan

The goal of this pilot project is to create a Python package that will simplify the encoding of oncology clinical data using the GA4GH Phenopakcet Schema <https://phenopacket-schema.readthedocs.io/en/latest/>_.

The pyphetools <https://github.com/monarch-initiative/pyphetools>_ project has a comparable code base targeted at rare disease.

This pilot project will use the API of the CDA project <https://github.com/CancerDataAggregator/cda-python.git> to access Cancer Data Aggregator <https://datacommons.cancer.gov/cancer-data-aggregator> resources, and output patient data using the Phenopacket Schema. The code can then be extended to ingest data from other sources.

GitHub project board

Let's use this project board to keep track of issues. The board is connected to the oncopacket repository:

Work items

The first phase of work will be to provide and test ETL code to transform CDA data into collections of phenopackets.

Let's use the following table to keep track of our status.

CDA ETL class oncopacket Class Status
CdaIndividualFactory OpIndividual done, needs unit tests