Load Phenopacket Store
Phenopacket Store Toolkit simplifies loading the Phenopacket Store cohorts. The toolkit removes the boilerplate required for downloading and keeping track of the release files, and for phenopacket I/O.
Load a single phenopacket cohort
Here we show how to load phenopackets of a SUOX cohort from Phenopacket Store release 0.1.18.
We start with the imports:
>>> from ppktstore.registry import configure_phenopacket_registry
>>> registry = configure_phenopacket_registry()
We created a PhenopacketStoreRegistry
to manage the release files.
By default, the registry stores the release ZIP files at $HOME/.phenopacket-store
for Unix,
or $HOME/phenopacket-store
for Windows.
Now let’s load the phenopackets.
>>> with registry.open_phenopacket_store(release="0.1.18") as ps:
... phenopackets = list(ps.iter_cohort_phenopackets("SUOX"))
>>> len(phenopackets)
35
We open Phenopacket Store with release 0.1.18. Behind the scenes, the registry checks if the ZIP file
has been previously downloaded. If absent, the ZIP file is fetched from GitHub.
This is followed by opening the ZIP file and creating PhenopacketStore
(ps
).
We can load phenopackets for a cohort name (e.g. SUOX). The phenopackets are loaded lazily,
and we collect them into a list.
We loaded 35 phenopackets!
Export a phenopacket cohort
Phenopackets of a cohort can easily be exported into a directory for further processing.
The export is implemented on the CohortInfo
class of the Phenopacket Store API:
>>> outdir = "dev/SUOX"
>>> with registry.open_phenopacket_store(release="0.1.18") as ps:
... cohort = ps.cohort_for_name("SUOX")
... cohort.export_phenopackets_to_directory(outdir)
We open Phenopacket Store and get the CohortInfo
for the SUOX cohort.
Then we export the phenopackets into a directory (e.g. "dev/SUOX"
)
We can check if the phenopackets were exported:
>>> import os
>>> paths = sorted(os.listdir(outdir))
>>> paths[:5]
['PMID_36303223_individual_10_PMID_12112661.json',
'PMID_36303223_individual_11_PMID_12112661.json',
'PMID_36303223_individual_12_PMID_12112661.json',
'PMID_36303223_individual_13_PMID_12112661.json',
'PMID_36303223_individual_14_PMID_11825068.json']
By default, the phenopackets are stored in JSON format. However, Protobuf wire format is also supported:
>>> with registry.open_phenopacket_store(release="0.1.18") as ps:
... cohort = ps.cohort_for_name("SUOX")
... cohort.export_phenopackets_to_directory(outdir, format="pb")
>>> sorted(os.listdir(outdir))[:5]
['PMID_36303223_individual_10_PMID_12112661.pb',
'PMID_36303223_individual_11_PMID_12112661.pb',
'PMID_36303223_individual_12_PMID_12112661.pb',
'PMID_36303223_individual_13_PMID_12112661.pb',
'PMID_36303223_individual_14_PMID_11825068.pb']
We use the format
option to export phenopackets as Protobuf files.