ppsc package
GA4GH Phenopacket Core is a library with wrapper classes and convenience methods for working with Phenopacket Schema.
The library simplifies handling of the building blocks Phenopacket Schema by providing typing information to empower IDEs.
The usage is best illustrated on a set of examples.
Create phenopacket programatically
We recommend to bring the classes into scope all at once using the import star:
>>> from ppsc.v202 import *
Then, we can build a phenopacket from the individual building blocks.
Let’s start with the subject:
>>> subject = Individual(
... id='proband A',
... time_at_last_encounter=TimeElement(
... element=Age(iso8601duration='P6M'),
... ),
... sex=Sex.FEMALE,
... )
>>> subject.id
'proband A'
>>> subject.sex.name
'FEMALE'
The created subject represents a female proband who had 6 months at the time of the last encounter.
We can update the fields using a simple assignment:
>>> subject.karyotypic_sex = KaryotypicSex.XX
>>> subject.karyotypic_sex.name
'XX'
We assigned an enum constant KaryotypicSex.XX to previously unset karyotypic_sex attribute.
The same can be done with object attributes:
>>> subject.vital_status = VitalStatus(
... status=VitalStatus.Status.DECEASED,
... time_of_death=TimeElement(
... element=Age(iso8601duration='P1Y')
... ),
... cause_of_death=OntologyClass(
... id='NCIT:C7541', label='Retinoblastoma',
... ),
... )
We set the vital status to indicate that the proband died at 1 year of age due to Retinoblastoma.
Now we can create a phenopacket. The phenopacket requires an identifier, MetaData and an optional subject.
>>> pp = Phenopacket(
... id='example.retinoblastoma.phenopacket.id',
... meta_data=MetaData(
... created=Timestamp.from_str('2021-05-14T10:35:00Z'),
... created_by='anonymous biocurator',
... ),
... )
To create a phenopacket, we must provide the id and meta_data fields since they are required by the Phenopacket Schema. The same applies to created and created_by fields of MetaData.
MetaData contextualizes the used ontology classes, such as NCIT:C7541 Retinoblastoma, to a particular ontology, such as NCI Thesaurus. We can store the ontology resource in MetaData.resources field:
>>> pp.meta_data.resources.append(
... Resource(
... id='ncit', name='NCI Thesaurus', url='http://purl.obolibrary.org/obo/ncit.owl',
... version='23.09d', namespace_prefix='NCIT', iri_prefix='http://purl.obolibrary.org/obo/NCIT_',
... ),
... )
All repeated elements, such as MetaData.resources, can be accessed via a list.
Read/write JSON and Protobuf
We can read and write phenopackets in JSON format using the JsonDeserializer and JsonSerializer classes:
>>> from ppsc.parse.json import JsonSerializer, JsonDeserializer
>>> serializer = JsonSerializer()
The serializer can write a Phenopacket Schema building block, such as OntologyClass or Phenopacket into a file handle:
>>> from io import StringIO
>>> buf = StringIO()
>>> serializer.serialize(subject.vital_status, buf)
>>> buf.getvalue()
'{"status": "DECEASED", "timeOfDeath": {"age": {"iso8601duration": "P1Y"}}, "causeOfDeath": {"id": "NCIT:C7541", "label": "Retinoblastoma"}}'
and the JSON can be read back from a file handle:
>>> _ = buf.seek(0) # Rewind and ignore the result
>>> deserializer = JsonDeserializer()
>>> decoded = deserializer.deserialize(buf, VitalStatus)
>>> decoded == subject.vital_status
True
The building block can also be written into Protobuf wire format. We can do a similar round-trip as above, but we will need a byte IO handle:
>>> from io import BytesIO
>>> byte_buf = BytesIO()
We can write the subject into the buffer and get the same data back:
>>> subject.dump_pb(byte_buf)
>>> _ = byte_buf.seek(0) # Rewind to start
>>> other = Individual.from_pb(byte_buf)
>>> subject == other
True
- class ppsc.Timestamp(seconds: int, nanos: int)[source]
Bases:
ToProtobuf
,FromProtobuf
This Timestamp implementation is functionally equivalent to protobuf’s timestamp.
Per protobuf API documentation, A Timestamp represents a point in time independent of any time zone or local calendar, encoded as a count of seconds and fractions of seconds at nanosecond resolution. The count is relative to an epoch at UTC midnight on January 1, 1970, in the proleptic Gregorian calendar which extends the Gregorian calendar backwards to year one.
Consult the Phenopacket Schema documentation for more information.
Examples
Here we show how to create a Timestamp from various inputs.
>>> from ppsc import Timestamp
Let’s create a timestamp from a date time string:
>>> ts = Timestamp.from_str('1970-01-01T00:00:30Z') >>> ts.seconds, ts.nanos (30, 0)
Note, we indicate that the timestamp is in UTC by adding Z suffix.
We can also create a timestamp from a local time. Let’s create the same Timestamp but now in Eastern Daylight Time (EDT) which is 4 hours behind UTC:
>>> ts_local = Timestamp.from_str('1969-12-31T20:00:30-04:00') >>> ts_local == ts True
We can also create timestamp from a datetime object:
>>> from datetime import datetime, date, time, timezone >>> d = date(1970, 1, 1) >>> t = time(0, 0, 30) >>> dt = datetime.combine(d, t, tzinfo=timezone.utc) >>> ts_dt = Timestamp.from_datetime(dt) >>> ts_dt == ts True
Last, we can create timestamp directly from seconds and nanoseconds:
>>> ts_raw = Timestamp(30, 0) >>> ts_raw == ts True
and we can convert the timestamp to a UTC date time string:
>>> ts_raw.as_str() '1970-01-01T00:00:30Z'
- as_datetime() datetime [source]
Convert timestamp into Python’s datetime object.
The datetime is always in UTC.
Example
>>> from ppsc import Timestamp >>> ts = Timestamp(10, 500) >>> dt = ts.as_datetime()
Now we can access the datetime components:
>>> dt.year, dt.month, dt.day, dt.hour, dt.minute, dt.second (1970, 1, 1, 0, 0, 10)
including the time zone:
>>> dt.tzname() 'UTC'
- as_str(fmt: str = '%Y-%m-%dT%H:%M:%SZ') str [source]
Convert timestamp into a date time string.
Example
>>> from ppsc import Timestamp >>> ts = Timestamp(0, 500_000) >>> ts.as_str() '1970-01-01T00:00:00Z'
We can use different formatting:
>>> ts.as_str('%Y-%m-%dT%H:%M:%S.%f%Z') '1970-01-01T00:00:00.000500UTC'
- static from_str(val: str, fmt: str = '%Y-%m-%dT%H:%M:%S%z')[source]
Create Timestamp from a date time string.
- Parameters:
val – the date time str.
fmt – the date time format string.
Subpackages
- ppsc.parse package
- ppsc.v202 package
Phenopacket
Individual
VitalStatus
Sex
KaryotypicSex
GeneDescriptor
AcmgPathogenicityClassification
TherapeuticActionability
VariantInterpretation
GenomicInterpretation
Diagnosis
Interpretation
ReferenceRange
Quantity
TypedQuantity
ComplexValue
Value
Measurement
TherapeuticRegimen
RadiationTherapy
DrugType
DoseInterval
Treatment
MedicalAction
Expression
Extension
VcfRecord
MoleculeContext
VariationDescriptor
PhenotypicFeature
Disease
Biosample
MetaData
Resource
Update
OntologyClass
ExternalReference
Evidence
Procedure
GestationalAge
Age
AgeRange
TimeInterval
TimeElement
Timestamp
File
Gene
Text
Number
IndefiniteRange
DefiniteRange
SimpleInterval
SequenceInterval
SequenceLocation
SequenceState
LiteralSequenceExpression
DerivedSequenceExpression
RepeatedSequenceExpression
CytobandInterval
ChromosomeLocation
Allele
Haplotype
CopyNumber
VariationSet
Variation