ontorunner.post package
Submodules
ontorunner.post.add_sentence module
Add sentences for understanding the context of matched terms.
- ontorunner.post.add_sentence.find_extensions(dr, ext) List[str]
Find files with a specific extension.
- Parameters
dr – Directory path.
ext – Extension.
- Returns
List of relevant files.
- ontorunner.post.add_sentence.get_match_type(token1: str, token2: str) str
Return type of token match.
- Parameters
token1 (str) – token from ‘matched_term’
token2 (str) – token from ‘preferred_term’
- Returns
Type of match [e.g.: ‘exact_match’ etc.]
- Return type
str
- ontorunner.post.add_sentence.parse(input_directory: str, output_directory: str, nodes_and_edges: str, need_ancestors: bool) None
Parse OGER output and add sentences of tokenized terms.
- Parameters
input_directory – (str) Input directory path.
output_directory – (str) Output directory path.
nodes_and_edges – (str) Nodes and edges file directory path.
- Returns
None.
- ontorunner.post.add_sentence.sentencify(input_df, output_df, output_fn)
Add relevant sentences to the tokenized term in every row of a pandas DataFrame.
- Parameters
df – (DataFrame) pandas DataFrame.
- Returns
None
ontorunner.post.util module
Utility functions called after NER.
- ontorunner.post.util.ancestor_generator(df: DataFrame, obj_series: DataFrame) List[str]
Return an ancestor list of a CURIE.
- Parameters
df – KGX edges of source ontology in DataFrame form.
- Returns
List of CURIES (ancestors)
- ontorunner.post.util.consolidate_rows(df: DataFrame) DataFrame
Group rows by all columns except “origin”.
This is done to remove redundancies created by entity recognition from multiple sources/ontologies
- Parameters
df (pd.DataFrame) – Input DataFrame
- Returns
Consolidated DataFrame
- Return type
pd.DataFrame
- ontorunner.post.util.filter_synonyms(df: DataFrame) DataFrame
Consolidate entities where ‘_SYNONYM’ object_id is a duplicate.
- Parameters
df (pd.DataFrame) – Input DataFrame
- Returns
Consolidated Dataframe
- Return type
pd.DataFrame
- ontorunner.post.util.get_ancestors(df: DataFrame, nodes_and_edges_dir: str = '/home/runner/work/ontorunner/ontorunner/data/nodes_and_edges') DataFrame
Return a DataFrame with ‘ancestors’ column.
- Parameters
df – Input dataframe containing intermediate NER result.
nodes_and_edges_dir – Dir location of KGX edges & nodes file (tsv)
- Returns
Dataframe with an ‘ancestors’ column.
- ontorunner.post.util.get_column_doc_ratio(df: DataFrame, column: str) DataFrame
Get str to document ratio of given column in a pandas DataFrame.
- Parameters
df – Pandas DataFrame
column – Column name of the term
- Returns
Pandas DataFrame with additional columns showing term:document ratio
Module contents
Post process the NER results.