.. _summary: ========================= Phenopacket Store Summary ========================= phenopacket-store-toolkit provides functions to generate various summaries and visualizations of the phenopackets and cohorts contained in a release. We use these functions to generate a summary of each `Phenopacket-Store release `_. Interested users should study the notebook file to learn how to access the functions. Display cohorts sorted by size ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In addition to the functions shown in the notebook, the following function will generate a Pandas table with all cohorts sorted according to size. :: from ppktstore.model import PhenopacketStore from ppktstore.release.stats import PPKtStoreStats import zipfile ppkt_zip_path = "../your/path/all_phenopackets.zip' with zipfile.ZipFile(ppkt_zip_path) as zf: store = PhenopacketStore.from_release_zip(zf) stats = PPKtStoreStats(store) df = stats.get_disease_count_table() df.head() # ... Note that some of the cohorts (which are usually gene-based) contain multiple disease entities. To get the total counts per cohort, the following code can be used. :: df_grouped = df.groupby('cohort')['count'].sum().reset_index() df_sorted = df_grouped.sort_values(by='count', ascending=False) df_sorted = df_sorted[["cohort", "count"]] df_sorted.reset_index(drop=True, inplace=True) This will produce a table something like the following. +-----------+-------+ | cohort | count | +===========+=======+ | STXBP1 | 463 | +-----------+-------+ | SCN2A | 393 | +-----------+-------+ | ANKRD11 | 337 | +-----------+-------+ | RPGRIP1 | 229 | +-----------+-------+ | SATB2 | 158 | +-----------+-------+ | TBX5 | 156 | +-----------+-------+ | ... | ... | +-----------+-------+ | MAF | 1 | +-----------+-------+ | OCA2 | 1 | +-----------+-------+