OptionColumnMapper
Bases: ColumnMapper
Class to map the contents of a table cell to one or more options (HPO terms)
This mapper should be used if the column has a set of multiple defined items (strings) representing HPO terms. The excluded_d argument should be used if the column includes excluded (negated) HPO terms
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column_name
|
str
|
name of the column in the pandas DataFrame |
required |
concept_recognizer
|
HpoConceptRecognizer for text mining |
required | |
option_d
|
TypedDict[str,str]
|
dictionary with key: string corresponding to original table, value: corresponding HPO term label |
required |
excluded_d
|
TypedDict[str,str]
|
dictionary with key: similar to option_d but for excluded HPO terms, optional |
None
|
omitSet
|
Set[str]
|
set of strings to be excluded from concept recognition |
None
|
Source code in pyphetools/creation/option_column_mapper.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 |
|
__init__(column_name, concept_recognizer, option_d, excluded_d=None, omitSet=None)
Constructor
Source code in pyphetools/creation/option_column_mapper.py
autoformat(df, hpo_cr, delimiter=',', omit_columns=None)
staticmethod
Autoformat code from the columns so that we can easily copy-paste and change it.
This method intends to save time by preformatting code the create OptionMappers. The following commands will print out skeleton Python code that can be easily adapted to create a mapper.
result = OptionColumnMapper.autoformat(df=dft, concept_recognizer=hpo_cr, delimiter=",")
print(result)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
data frame with the data about the individuals |
required |
concept_recognizer
|
HpoConceptRecognizer for text mining |
required | |
delimiter
|
str
|
the string used to delimit individual items in a cell (default: comma) |
','
|
omit_columns
|
List[str]
|
names of columns to omit from this search |
None
|
df_name
|
str
|
name of the variable that corresponds to the dataframe |
required |
Returns:
Type | Description |
---|---|
str
|
a string that should be displayed using a print() command in the notebook - has info about automatically mapped columns |
Source code in pyphetools/creation/option_column_mapper.py
map_cell(cell_contents)
parse a single table cell
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cell_contents
|
str
|
contents of a cell of the original file |
required |
Returns:
Type | Description |
---|---|
List[HpTerm]
|
list of HPO matches |
Source code in pyphetools/creation/option_column_mapper.py
preview_column(df)
Generate a pandas dataframe with a summary of parsing of the entire column
This method is intended for use in developing the code for ETL of an input column. It is only needed for development and debugging.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column
|
pd.Series
|
A single column from the input table |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame
|
a pandas dataframe with one row for each entry of the input column |