Phenopacket2Prompt
phenopacket2prompt is a Java application that creates prompts for Large Language Models (LLMs) on the basis of clinical data that has been encoded using the Global Alliance for Genomics and Health (GA4GH) Phenopacket Schema.
Optionally, the prompts can be generated in Czech, Dutch German, Italian, and Spanish.
Running phenopacket2prompt
There are currently two use cases:
1. The creation of prompts (in several languages!), starting from phenopackets, intended for use with a Large Language Model (LLM) which is asked for a differential diagnosis.
2. The creation of phenopackets from case reports via text mining using the fenominal
3. library.
Running with Phenopackets
For this use case, follow the instructions in Set-up and Batch.
Running with case reports
Assuming the hp.json file has been downloaded as described in Set-up and all the case report text files
are available in a directory at some/path/gptdocs
, run
This command will create a new directory called gptOut
(this can be adjusted using the -o option).
It will contain four subdirectories
- phenopackets. GA4GH phenopackets derived from each case report
- phenopacket_based_queries. Feature-based query prompts for GPT-4 based on the information in the phenopackets
- txt_without_discussion. Original query based on the original case report with text as presented by the first discussant up to but not including text contributed by the second discussant or any following text
- txt_with_differential. Text that starts with the presentation by the first discussant up to and including the differential. This was used to check parsing but was not used in our analysis.
Feedback
The best place to leave feedback, ask questions, and report bugs is the phenopacket2prompt Issue Tracker.