Skip to content

CdaFactory

The CdaFactory class serves as a base class for the various factory classes that transform Cancer Data Aggregator (CDA) data into components of GA4GH Phenopackets.

Overview

This abstract factory class provides common functionality for its concrete subclasses: - CdaIndividualFactory: Transforms subject data into Individual objects - CdaDiseaseFactory: Transforms diagnosis data into Disease objects - CdaBiosampleFactory: Transforms specimen/sample data into Biosample objects - CdaMutationFactory: Transforms mutation data into Variant objects

API Documentation

Superclass for the CDA Factory Classes

Each subclass must implement the to_ga4gh method, which transforms a row of a table from CDA to a GA4GH Message.

Source code in src/oncopacket/cda/cda_factory.py
class CdaFactory(metaclass=abc.ABCMeta):
    """Superclass for the CDA Factory Classes

    Each subclass must implement the to_ga4gh method, which transforms a row of a table from CDA to a GA4GH Message.
    """

    @abc.abstractmethod
    def to_ga4gh(self, row: pd.Series):
        """Return a message from the GA4GH Phenopacket Schema that corresponds to this row.

        :param row: A row from the CDA
        :type row: pd.Series
        :returns: a message from the GA4GH Phenopacket Schema
        :raises ValueError: if unable to parse
        """
        pass

    def get_item(self, row, column_name):
        if column_name not in row:
            raise ValueError(f"Expecting to find {column_name} in row but did not. These are the columns: {row.columns}")
        return row[column_name]

    def get_items_from_row(self, row, column_names):
        if not isinstance(column_names, list):
            raise ValueError(f"column_names argument must be a list but was {type(column_names)}")
        results = []
        for name in column_names:
            results.append(self.get_item(row, name))
        return results

    @staticmethod
    def days_to_iso(days: typing.Union[int, float, str]) -> typing.Optional[str]:
        """
        Convert the number of days of life into an ISO 8601 period representing the age of an individual.

        Note, we only use the `D` designator as transformation to years or months would be lossy.

        The `days` can be negative, leading to the duration of the same length.

        `None` is returned if the input `str` cannot be parsed into an integer.

        :param days: a `str` or `int` with a number of days of life.
        :raises ValueError: if `days` is not an `int` or a `str`.
        """
        if type(days) is int:
            # In Python, `isinstance(True, int) == True`.
            # However, we don't want that here.
            pass
        elif isinstance(days, str):
            if simple_float_pattern.match(days):
                days = round(float(days))
            else:
                return None
        elif isinstance(days, float):
            if math.isfinite(days):
                days: int = round(days)
            else:
                return None
        else:
            raise ValueError(f"days argument must be an int or a str but was {type(days)}")

        return f'P{abs(days)}D'

    def get_local_share_directory(self, local_dir=None):
        my_platform = platform.platform()
        my_system = platform.system()
        if local_dir is None:
            local_dir = os.path.join(os.path.expanduser('~'), ".oncoexporter")
        if not os.path.exists(local_dir):
            os.makedirs(local_dir)
            print(f"[INFO] Created new directory for oncoexporter at {local_dir}")
        return local_dir

days_to_iso(days) staticmethod

Convert the number of days of life into an ISO 8601 period representing the age of an individual.

Note, we only use the D designator as transformation to years or months would be lossy.

The days can be negative, leading to the duration of the same length.

None is returned if the input str cannot be parsed into an integer.

Parameters:

Name Type Description Default
days Union[int, float, str]

a str or int with a number of days of life.

required

Raises:

Type Description
ValueError

if days is not an int or a str.

Source code in src/oncopacket/cda/cda_factory.py
@staticmethod
def days_to_iso(days: typing.Union[int, float, str]) -> typing.Optional[str]:
    """
    Convert the number of days of life into an ISO 8601 period representing the age of an individual.

    Note, we only use the `D` designator as transformation to years or months would be lossy.

    The `days` can be negative, leading to the duration of the same length.

    `None` is returned if the input `str` cannot be parsed into an integer.

    :param days: a `str` or `int` with a number of days of life.
    :raises ValueError: if `days` is not an `int` or a `str`.
    """
    if type(days) is int:
        # In Python, `isinstance(True, int) == True`.
        # However, we don't want that here.
        pass
    elif isinstance(days, str):
        if simple_float_pattern.match(days):
            days = round(float(days))
        else:
            return None
    elif isinstance(days, float):
        if math.isfinite(days):
            days: int = round(days)
        else:
            return None
    else:
        raise ValueError(f"days argument must be an int or a str but was {type(days)}")

    return f'P{abs(days)}D'

to_ga4gh(row) abstractmethod

Return a message from the GA4GH Phenopacket Schema that corresponds to this row.

Parameters:

Name Type Description Default
row Series

A row from the CDA

required

Returns:

Type Description

a message from the GA4GH Phenopacket Schema

Raises:

Type Description
ValueError

if unable to parse

Source code in src/oncopacket/cda/cda_factory.py
@abc.abstractmethod
def to_ga4gh(self, row: pd.Series):
    """Return a message from the GA4GH Phenopacket Schema that corresponds to this row.

    :param row: A row from the CDA
    :type row: pd.Series
    :returns: a message from the GA4GH Phenopacket Schema
    :raises ValueError: if unable to parse
    """
    pass