ontorunner package

Submodules

ontorunner.oger_module module

Run OGER.

ontorunner.oger_module.run_oger(content='data/input', termlist='data/terms/DICT.tsv', output='data/output/', output_format='tsv', settings='settings.ini', workers=1, nodes_and_edges='/home/runner/work/ontorunner/ontorunner/data/nodes_and_edges', need_ancestors=False) None

Run OGER.

Parameters
  • content – Input file OR folder containing txt files.

  • termlist – Path to the dictionary (TSV format).

  • output – Path to save the output file.

  • output_format – tsv (default).

  • settings – If this is provided, all other arguments

are provided in this file and are hence optional. Make changes to this file according to project needs s(default:’settings.ini’). :param workers: Number of parallel threads (default = 1). :param nodes_and_edges: Directory where KGX nodes and edges tsv files. :param need_ancestors: Bool to decide if ancestors should be present in the output or no. :return: None.

ontorunner.spacy_module module

Run Spacy.

ontorunner.spacy_module.explode_df(df: DataFrame) DataFrame

Explode multiple DataFrames in a single row into multiple rows.

Parameters

df – Dataframe to be exploded.

Returns

Exploded DataFrame where each row correspond to a row in the DataFrame.

ontorunner.spacy_module.export_tsv(df: DataFrame, data_dir: str, fn: str) None

Export pandas DataFrame object into a TSV file.

Parameters
  • df – Pandas DataFrame.

  • data_dir – Destination directory for export.

  • fn – Filename.

ontorunner.spacy_module.get_knowledge_base_enitities(doc: Doc, onto_ruler_obj: OntoRuler) DataFrame

Get information from the SciSpacy pipeline.

Parameters
  • doc – Doc object.

  • onto_ruler_obj – OntoRuler object.

Returns

Pandas DataFrame.

ontorunner.spacy_module.get_token_info(doc: Doc) DataFrame

Get metadata associated with spans within a document.

Parameters

doc – Doc object.

Returns

Pandas DataFrame.

ontorunner.spacy_module.onto_tokenize(doc: Doc, onto_ruler_obj: OntoRuler) Doc

Set custom span information from the Doc object.

Parameters
  • doc – Doc object.

  • onto_ruler_obj – OntoRuler object.

Returns

Doc object.

ontorunner.spacy_module.run_spacy(data_dir: Path = '/home/runner/work/ontorunner/ontorunner/data', settings_file: Path = '/home/runner/work/ontorunner/ontorunner/ontorunner/settings.ini', linker: str = 'umls', to_pickle: bool = True, need_ancestors: bool = False, viz: bool = False) OntoRuler

Run spacy with sciSpacy pipeline.

Parameters
  • data_dir – Path to the data directory.

  • settings – Path to settings.ini file.

  • linker – Type of sciSpacy linker desired ([umls]/mesh).

  • to_pickle – Pickle intermediate files. (True/False)

  • need_ancestors – Include ancestors of annotated terms. (True/False)

  • viz – Include visualizations (png and svg) in output. (True/False)

Returns

OntoRuler object.

ontorunner.spacy_module.run_viz(input_text: str = 'A bacterial isolate, designated strain SZ,was obtained from noncontaminated creek sediment microcosms based on its ability to derive energy from acetate oxidation coupled to tetrachloroethene.', obj: Optional[OntoRuler] = None)

Text that needs to be annotated.

:param input_text:Text to be annotated, defaults to DEFAULT_TEXT

Module contents

Constants.