ontorunner package
Subpackages
Submodules
ontorunner.oger_module module
Run OGER.
- ontorunner.oger_module.run_oger(content='data/input', termlist='data/terms/DICT.tsv', output='data/output/', output_format='tsv', settings='settings.ini', workers=1, nodes_and_edges='/home/runner/work/ontorunner/ontorunner/data/nodes_and_edges', need_ancestors=False) None
Run OGER.
- Parameters
content – Input file OR folder containing txt files.
termlist – Path to the dictionary (TSV format).
output – Path to save the output file.
output_format – tsv (default).
settings – If this is provided, all other arguments
are provided in this file and are hence optional. Make changes to this file according to project needs s(default:’settings.ini’). :param workers: Number of parallel threads (default = 1). :param nodes_and_edges: Directory where KGX nodes and edges tsv files. :param need_ancestors: Bool to decide if ancestors should be present in the output or no. :return: None.
ontorunner.spacy_module module
Run Spacy.
- ontorunner.spacy_module.explode_df(df: DataFrame) DataFrame
Explode multiple DataFrames in a single row into multiple rows.
- Parameters
df – Dataframe to be exploded.
- Returns
Exploded DataFrame where each row correspond to a row in the DataFrame.
- ontorunner.spacy_module.export_tsv(df: DataFrame, data_dir: str, fn: str) None
Export pandas DataFrame object into a TSV file.
- Parameters
df – Pandas DataFrame.
data_dir – Destination directory for export.
fn – Filename.
- ontorunner.spacy_module.get_knowledge_base_enitities(doc: Doc, onto_ruler_obj: OntoRuler) DataFrame
Get information from the SciSpacy pipeline.
- Parameters
doc – Doc object.
onto_ruler_obj – OntoRuler object.
- Returns
Pandas DataFrame.
- ontorunner.spacy_module.get_token_info(doc: Doc) DataFrame
Get metadata associated with spans within a document.
- Parameters
doc – Doc object.
- Returns
Pandas DataFrame.
- ontorunner.spacy_module.onto_tokenize(doc: Doc, onto_ruler_obj: OntoRuler) Doc
Set custom span information from the Doc object.
- Parameters
doc – Doc object.
onto_ruler_obj – OntoRuler object.
- Returns
Doc object.
- ontorunner.spacy_module.run_spacy(data_dir: Path = '/home/runner/work/ontorunner/ontorunner/data', settings_file: Path = '/home/runner/work/ontorunner/ontorunner/ontorunner/settings.ini', linker: str = 'umls', to_pickle: bool = True, need_ancestors: bool = False, viz: bool = False) OntoRuler
Run spacy with sciSpacy pipeline.
- Parameters
data_dir – Path to the data directory.
settings – Path to settings.ini file.
linker – Type of sciSpacy linker desired ([umls]/mesh).
to_pickle – Pickle intermediate files. (True/False)
need_ancestors – Include ancestors of annotated terms. (True/False)
viz – Include visualizations (png and svg) in output. (True/False)
- Returns
OntoRuler object.
- ontorunner.spacy_module.run_viz(input_text: str = 'A bacterial isolate, designated strain SZ,was obtained from noncontaminated creek sediment microcosms based on its ability to derive energy from acetate oxidation coupled to tetrachloroethene.', obj: Optional[OntoRuler] = None)
Text that needs to be annotated.
:param input_text:Text to be annotated, defaults to DEFAULT_TEXT
Module contents
Constants.