ontoRunNER
This is a wrapper project around the following named entity recognition (NER) tools:
- OGER. 
Setup
To setup ontoRunNER,
For users
Activate your virtual environment (poetry or conda or venv etc.)
pip install ontorunner
pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_ner_craft_md-0.5.0.tar.gz
python -m spacy download en_core_web_sm
Note: If you’re using poetry outside of
poetry shell, precede all CLI commands with apoetry run.
Ontology to KGX TSV
Generate nodes.tsv and edges.tsv files from your OBO JSON ontology file,
CLI
onto-util json2tsv -i ontology.json -o output
Python
from ontorunner.pre.util import json2tsv
json2tsv('ontology.json', 'output.tsv')
Preparing term-list
Generate termlist from the output_nodes.tsv generated in the previous step.
CLI
The conversion can be done as follows,
onto-util prepare-termlist -i output_nodes.tsv -o termlist.tsv
Python
from ontorunner.pre.util import prepare_termlist
prepare_termlist('output_nodes.tsv', 'termlist.tsv')
Running OGER.
Note: Make sure the output directory
data/outputis empty before every run.
You can run OGER against a text document as follows,
CLI
ontoger run -c abstract.txt -t termlist.tsv -o out.json -f bioc_json
Note: This command is just to demonstrate how to run OGER. For more use cases, here is the reference to the OGER documentation.
Running OGER using a ‘settings.ini’ file
You can run OGER using a ‘settings’ file as follows,
CLI
ontoger run -s settings.ini
Python
from ontorunner import oger_module
oger_module.run_oger(settings=settingsFile)
The settings.ini file provides all relevant arguments to OGER. More information on the parameter list could be found at the OGER GitHub
There will be two output tsv files generated:
- An output whose filename is exactly similar to the input filename (say - docs.tsv)- This is the pure output from - OGER
 
- Another file named - docs_ontoRunNER.tsvwhich contains more results because it is the outcome of some postprocessing.
Running spaCy.
For now, spaCy (within ontoRunNER) can only process documents prepared as a tsv
(or multiple tsv) file(s) with two columns:
- id 
- text 
By default, these files are expected to be in the data/input directory. If not, then the user can provide the path of the data directory using the -d or --data-dir parameter.
The settings.ini file used in OGER above is also used by spaCy for some of its parameters.
CLI
ontospacy run
Python
from ontorunner import spacy_module
spacy_module.run_spacy()
There will be two output tsv files generated:
- ontology_ontoRunNER.tsv: This file is the output with the ontology termlists (generated above) as the dictionary for entity recognition.
- umls_ontoRunNER.tsv: This file is the output derived by using- sciSpaCY’s- EntityLinker. By default the linker is- umlsbut you can provide others as listed here.
Visualization using spaCy.displaCy.
SpaCy visualizers are also available through ontoRunNER! There are two types of visualizers offered by displaCy:
- Displays dependencies 
- Highlights entities 
Both are rendered using one command - run-viz.
CLI
ontospacy viz -t **some text**
Python
from ontorunner import spacy_module
test = """A bacterial isolate, designated \
strain SZ,was obtained from noncontaminated creek \
sediment microcosms based on its ability to derive \
energy from acetate oxidation coupled to tetrachloroethene."""
spacy_module.run_viz(text)