Phenopacket Store Summary
phenopacket-store-toolkit provides functions to generate various summaries and visualizations of the phenopackets and cohorts contained in a release. We use these functions to generate a summary of each Phenopacket-Store release.
Interested users should study the notebook file to learn how to access the functions.
Display cohorts sorted by size
In addition to the functions shown in the notebook, the following function will generate a Pandas table with all cohorts sorted according to size.
from ppktstore.model import PhenopacketStore
from ppktstore.release.stats import PPKtStoreStats
import zipfile
ppkt_zip_path = "../your/path/all_phenopackets.zip'
with zipfile.ZipFile(ppkt_zip_path) as zf:
store = PhenopacketStore.from_release_zip(zf)
stats = PPKtStoreStats(store)
df = stats.get_disease_count_table()
df.head()
# ...
Note that some of the cohorts (which are usually gene-based) contain multiple disease entities. To get the total counts per cohort, the following code can be used.
df_grouped = df.groupby('cohort')['count'].sum().reset_index()
df_sorted = df_grouped.sort_values(by='count', ascending=False)
df_sorted = df_sorted[["cohort", "count"]]
df_sorted.reset_index(drop=True, inplace=True)
This will produce a table something like the following.
cohort |
count |
---|---|
STXBP1 |
463 |
SCN2A |
393 |
ANKRD11 |
337 |
RPGRIP1 |
229 |
SATB2 |
158 |
TBX5 |
156 |
… |
… |
MAF |
1 |
OCA2 |
1 |