OptionColumnMapper
Bases: ColumnMapper
Class to map the contents of a table cell to one or more options (HPO terms)
This mapper should be used if the column has a set of multiple defined items (strings) representing HPO terms. The excluded_d argument should be used if the column includes excluded (negated) HPO terms
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column_name
|
str
|
name of the column in the pandas DataFrame |
required |
concept_recognizer
|
HpoConceptRecognizer for text mining |
required | |
option_d
|
TypedDict[str,str]
|
dictionary with key: string corresponding to original table, value: corresponding HPO term label |
required |
excluded_d
|
TypedDict[str,str]
|
dictionary with key: similar to option_d but for excluded HPO terms, optional |
None
|
omitSet
|
Set[str]
|
set of strings to be excluded from concept recognition |
None
|
Source code in pyphetools/creation/option_column_mapper.py
|
|
__init__(column_name, concept_recognizer, option_d, excluded_d=None, omitSet=None)
Constructor
Source code in pyphetools/creation/option_column_mapper.py
autoformat(df, hpo_cr, delimiter=',', omit_columns=None)
staticmethod
Autoformat code from the columns so that we can easily copy-paste and change it.
This method intends to save time by preformatting code the create OptionMappers. The following commands will print out skeleton Python code that can be easily adapted to create a mapper.
result = OptionColumnMapper.autoformat(df=dft, concept_recognizer=hpo_cr, delimiter=",")
print(result)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
data frame with the data about the individuals |
required |
concept_recognizer
|
HpoConceptRecognizer for text mining |
required | |
delimiter
|
str
|
the string used to delimit individual items in a cell (default: comma) |
','
|
omit_columns
|
List[str]
|
names of columns to omit from this search |
None
|
df_name
|
str
|
name of the variable that corresponds to the dataframe |
required |
Returns:
Type | Description |
---|---|
str
|
a string that should be displayed using a print() command in the notebook - has info about automatically mapped columns |
Source code in pyphetools/creation/option_column_mapper.py
map_cell(cell_contents)
parse a single table cell
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cell_contents
|
str
|
contents of a cell of the original file |
required |
Returns:
Type | Description |
---|---|
List[HpTerm]
|
list of HPO matches |
Source code in pyphetools/creation/option_column_mapper.py
preview_column(df)
Generate a pandas dataframe with a summary of parsing of the entire column
This method is intended for use in developing the code for ETL of an input column. It is only needed for development and debugging.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column
|
pd.Series
|
A single column from the input table |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame
|
a pandas dataframe with one row for each entry of the input column |