ontorunner.post package

Submodules

ontorunner.post.add_sentence module

Add sentences for understanding the context of matched terms.

ontorunner.post.add_sentence.find_extensions(dr, ext) List[str]

Find files with a specific extension.

Parameters
  • dr – Directory path.

  • ext – Extension.

Returns

List of relevant files.

ontorunner.post.add_sentence.get_match_type(token1: str, token2: str) str

Return type of token match.

Parameters
  • token1 (str) – token from ‘matched_term’

  • token2 (str) – token from ‘preferred_term’

Returns

Type of match [e.g.: ‘exact_match’ etc.]

Return type

str

ontorunner.post.add_sentence.parse(input_directory: str, output_directory: str, nodes_and_edges: str, need_ancestors: bool) None

Parse OGER output and add sentences of tokenized terms.

Parameters
  • input_directory – (str) Input directory path.

  • output_directory – (str) Output directory path.

  • nodes_and_edges – (str) Nodes and edges file directory path.

Returns

None.

ontorunner.post.add_sentence.sentencify(input_df, output_df, output_fn)

Add relevant sentences to the tokenized term in every row of a pandas DataFrame.

Parameters

df – (DataFrame) pandas DataFrame.

Returns

None

ontorunner.post.util module

Utility functions called after NER.

ontorunner.post.util.ancestor_generator(df: DataFrame, obj_series: DataFrame) List[str]

Return an ancestor list of a CURIE.

Parameters

df – KGX edges of source ontology in DataFrame form.

Returns

List of CURIES (ancestors)

ontorunner.post.util.consolidate_rows(df: DataFrame) DataFrame

Group rows by all columns except “origin”.

This is done to remove redundancies created by entity recognition from multiple sources/ontologies

Parameters

df (pd.DataFrame) – Input DataFrame

Returns

Consolidated DataFrame

Return type

pd.DataFrame

ontorunner.post.util.filter_synonyms(df: DataFrame) DataFrame

Consolidate entities where ‘_SYNONYM’ object_id is a duplicate.

Parameters

df (pd.DataFrame) – Input DataFrame

Returns

Consolidated Dataframe

Return type

pd.DataFrame

ontorunner.post.util.get_ancestors(df: DataFrame, nodes_and_edges_dir: str = '/home/runner/work/ontorunner/ontorunner/data/nodes_and_edges') DataFrame

Return a DataFrame with ‘ancestors’ column.

Parameters
  • df – Input dataframe containing intermediate NER result.

  • nodes_and_edges_dir – Dir location of KGX edges & nodes file (tsv)

Returns

Dataframe with an ‘ancestors’ column.

ontorunner.post.util.get_column_doc_ratio(df: DataFrame, column: str) DataFrame

Get str to document ratio of given column in a pandas DataFrame.

Parameters
  • df – Pandas DataFrame

  • column – Column name of the term

Returns

Pandas DataFrame with additional columns showing term:document ratio

Module contents

Post process the NER results.