Skip to content

Utils

Contains all pheval utility methods

rand(df, min_num, max_num, scramble_factor)

Numeric scrambling

Parameters:

Name Type Description Default
df pd.DataFrame

dataframe records

required
min_num int

min value from this records

required
max_num int

max value from this records

required
scramble_factor float

scramble factor scalar

required

Returns:

Name Type Description
float float

randomized number

Source code in src/pheval/utils/utils.py
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
def rand(df: pd.DataFrame, min_num: int, max_num: int, scramble_factor: float) -> float:
    """
    Numeric scrambling
    Args:
        df (pd.DataFrame): dataframe records
        min_num (int): min value from this records
        max_num (int): max value from this records
        scramble_factor (float): scramble factor scalar
    Returns:
        float: randomized number
    """
    try:
        return df + (random.uniform(min_num, max_num) * scramble_factor)
    except TypeError as err:
        info_log.error(df, exc_info=err)
        return df

semsim_scramble(input, output, columns_to_be_scrambled, scramble_factor=0.5)

Scrambles semantic similarity profile with a magnitude between 0 and 1 (scramble_factor: 0 means no scrambling and 1 means complete randomisation). It then randomises the above scores with a degree of the scramble_factor and returns a scrambles pandas dataframe. Args: input (Path): scramble_factor (float) scalar scramble factor columns_to_be_scrambled (List[str]): columns that will be scrambled in semsim file (e.g. jaccard_similarity). output (Path) Returns: pd.Dataframe: scrambled dataframe

Source code in src/pheval/utils/utils.py
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
def semsim_scramble(
    input: Path,
    output: Path,
    columns_to_be_scrambled: List[str],
    scramble_factor: float = 0.5,
) -> pd.DataFrame:
    """
    Scrambles semantic similarity profile with a magnitude between 0 and 1 (scramble_factor:
    0 means no scrambling and 1 means complete randomisation).
    It then randomises the above scores with a degree of the scramble_factor
    and returns a scrambles pandas dataframe.
        Args:
              input (Path):
              scramble_factor (float) scalar scramble factor
              columns_to_be_scrambled (List[str]):
              columns that will be scrambled in semsim file (e.g. jaccard_similarity).
              output (Path)
        Returns:
            pd.Dataframe: scrambled dataframe
    """
    semsim = pd.read_csv(input, sep="\t")
    dataframe = semsim_scramble_df(semsim, columns_to_be_scrambled, scramble_factor)
    dataframe.to_csv(output, sep="\t", index=False)

semsim_scramble_df(dataframe, columns_to_be_scrambled, scramble_factor)

scramble_semsim_df

Parameters:

Name Type Description Default
dataframe pd.DataFrame

dataframe that contains semsim profile

required
columns_to_be_scrambled List[str] required

Returns:

Type Description
pd.DataFrame

pd.Dataframe: scrambled dataframe

Source code in src/pheval/utils/utils.py
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
def semsim_scramble_df(
    dataframe: pd.DataFrame,
    columns_to_be_scrambled: List[str],
    scramble_factor: float,
) -> pd.DataFrame:
    """scramble_semsim_df
    Args:
        dataframe (pd.DataFrame): dataframe that contains semsim profile
        scramble_factor (float) scalar scramble factor
        columns_to_be_scrambled (List[str]):
    Returns:
        pd.Dataframe: scrambled dataframe
    """
    for col in columns_to_be_scrambled:
        min_num = dataframe[col].min()
        max_num = dataframe[col].max()
        dataframe[col] = dataframe[col].apply(rand, args=(min_num, max_num, scramble_factor))
    return dataframe