Semsim utils
Contains all pheval utility methods
diff_semsim(semsim_left, semsim_right, score_column, absolute_diff)
Calculates score difference between two semantic similarity profiles
Parameters:
Name | Type | Description | Default |
---|---|---|---|
semsim_left |
pd.DataFrame
|
first semantic similarity dataframe |
required |
semsim_right |
pd.DataFrame
|
second semantic similarity dataframe |
required |
score_column |
str
|
Score column that will be computed (e.g. jaccard_similarity) |
required |
absolute_diff |
bool
|
Whether the difference is absolute (True) or percentage (False). |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pd.DataFrame: A dataframe with terms and its scores differences |
Source code in src/pheval/utils/semsim_utils.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
|
filter_non_0_score(data, col)
Removes rows that have value equal to 0 based on the given column passed by col parameter
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
pd.DataFrame
|
Dirty dataframe |
required |
col |
str
|
Column to be filtered |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pd.DataFrame: Filtered dataframe |
Source code in src/pheval/utils/semsim_utils.py
14 15 16 17 18 19 20 21 22 23 24 |
|
get_percentage_diff(current_number, previous_number)
Gets the percentage difference between two numbers
Parameters:
Name | Type | Description | Default |
---|---|---|---|
current_number |
float
|
second number in comparison |
required |
previous_number |
float
|
first number in comparison |
required |
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
percentage difference between two numbers |
Source code in src/pheval/utils/semsim_utils.py
138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
|
parse_semsim(df, cols)
Parses semantic similarity profiles converting the score column as a numeric value and dropping the null ones
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
pd.DataFrame
|
semantic similarity profile dataframe |
required |
cols |
list
|
list of columns that will be selected on semsim data |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pd.Dataframe: parsed semantic similarity dataframe |
Source code in src/pheval/utils/semsim_utils.py
27 28 29 30 31 32 33 34 35 36 37 38 39 |
|
percentage_diff(semsim_left, semsim_right, score_column, output)
Compares two semantic similarity profiles
Parameters:
Name | Type | Description | Default |
---|---|---|---|
semsim_left |
Path
|
File path of the first semantic similarity profile |
required |
semsim_right |
Path
|
File path of the second semantic similarity profile |
required |
score_column |
str
|
Score column that will be computed (e.g. jaccard_similarity) |
required |
output |
Path
|
Output path for the difference tsv file |
required |
Source code in src/pheval/utils/semsim_utils.py
67 68 69 70 71 72 73 74 75 76 77 |
|
semsim_analysis(semsim_left, semsim_right, score_column, absolute_diff=True)
semsim_analysis
Parameters:
Name | Type | Description | Default |
---|---|---|---|
semsim_left |
Path
|
File path of the first semantic similarity profile |
required |
semsim_right |
Path
|
File path of the second semantic similarity profile |
required |
score_column |
str
|
Score column that will be computed (e.g. jaccard_similarity) |
required |
absolute_diff |
bool
|
Whether the difference is absolute (True) or percentage (False). |
True
|
Returns:
Type | Description |
---|---|
pd.DataFrame
|
[pd.DataFrame]: DataFrame with the differences between two semantic similarity profiles |
Source code in src/pheval/utils/semsim_utils.py
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
|
semsim_heatmap_plot(semsim_left, semsim_right, score_column)
Plots semantic similarity profiles heatmap
Parameters:
Name | Type | Description | Default |
---|---|---|---|
semsim_left |
Path
|
File path of the first semantic similarity profile |
required |
semsim_right |
Path
|
File path of the second semantic similarity profile |
required |
score_column |
str
|
Score column that will be computed (e.g. jaccard_similarity) |
required |
Source code in src/pheval/utils/semsim_utils.py
80 81 82 83 84 85 86 87 88 89 90 91 |
|
validate_semsim_file_comparison(semsim_left, semsim_right)
Checks if files exist and whether they're different
Parameters:
Name | Type | Description | Default |
---|---|---|---|
semsim_left |
Path
|
File path of the first semantic similarity profile |
required |
semsim_right |
Path
|
File path of the second semantic similarity profile |
required |
Raises:
Type | Description |
---|---|
Exception
|
FileNotFoundException |
Source code in src/pheval/utils/semsim_utils.py
124 125 126 127 128 129 130 131 132 133 134 135 |
|