ontorunner.post package
Submodules
ontorunner.post.add_sentence module
Add sentences for understanding the context of matched terms.
- ontorunner.post.add_sentence.find_extensions(dr, ext) List[str]
- Find files with a specific extension. - Parameters
- dr – Directory path. 
- ext – Extension. 
 
- Returns
- List of relevant files. 
 
- ontorunner.post.add_sentence.get_match_type(token1: str, token2: str) str
- Return type of token match. - Parameters
- token1 (str) – token from ‘matched_term’ 
- token2 (str) – token from ‘preferred_term’ 
 
- Returns
- Type of match [e.g.: ‘exact_match’ etc.] 
- Return type
- str 
 
- ontorunner.post.add_sentence.parse(input_directory: str, output_directory: str, nodes_and_edges: str, need_ancestors: bool) None
- Parse OGER output and add sentences of tokenized terms. - Parameters
- input_directory – (str) Input directory path. 
- output_directory – (str) Output directory path. 
- nodes_and_edges – (str) Nodes and edges file directory path. 
 
- Returns
- None. 
 
- ontorunner.post.add_sentence.sentencify(input_df, output_df, output_fn)
- Add relevant sentences to the tokenized term in every row of a pandas DataFrame. - Parameters
- df – (DataFrame) pandas DataFrame. 
- Returns
- None 
 
ontorunner.post.util module
Utility functions called after NER.
- ontorunner.post.util.ancestor_generator(df: DataFrame, obj_series: DataFrame) List[str]
- Return an ancestor list of a CURIE. - Parameters
- df – KGX edges of source ontology in DataFrame form. 
- Returns
- List of CURIES (ancestors) 
 
- ontorunner.post.util.consolidate_rows(df: DataFrame) DataFrame
- Group rows by all columns except “origin”. - This is done to remove redundancies created by entity recognition from multiple sources/ontologies - Parameters
- df (pd.DataFrame) – Input DataFrame 
- Returns
- Consolidated DataFrame 
- Return type
- pd.DataFrame 
 
- ontorunner.post.util.filter_synonyms(df: DataFrame) DataFrame
- Consolidate entities where ‘_SYNONYM’ object_id is a duplicate. - Parameters
- df (pd.DataFrame) – Input DataFrame 
- Returns
- Consolidated Dataframe 
- Return type
- pd.DataFrame 
 
- ontorunner.post.util.get_ancestors(df: DataFrame, nodes_and_edges_dir: str = '/home/runner/work/ontorunner/ontorunner/data/nodes_and_edges') DataFrame
- Return a DataFrame with ‘ancestors’ column. - Parameters
- df – Input dataframe containing intermediate NER result. 
- nodes_and_edges_dir – Dir location of KGX edges & nodes file (tsv) 
 
- Returns
- Dataframe with an ‘ancestors’ column. 
 
- ontorunner.post.util.get_column_doc_ratio(df: DataFrame, column: str) DataFrame
- Get str to document ratio of given column in a pandas DataFrame. - Parameters
- df – Pandas DataFrame 
- column – Column name of the term 
 
- Returns
- Pandas DataFrame with additional columns showing term:document ratio 
 
Module contents
Post process the NER results.