🛠️ Tools¶
These lists of tools are focused on those developed and/or used by Monarch members; those with ‘True’ as their value for Internal are built by Monarch in whole or in part.
General Purpose Tools¶
| Name | Description | Repo | Docs | Internal |
|---|---|---|---|---|
| AIO | Artificial Intelligence Ontology | GitHub | arXiv | True |
| Aurelian | Aurelian: Agentic Universal Research Engine for Literature, Integration, Annotation, and Navigation | Github | Docs | True |
| Datasette LLM library | Or llm for short. A CLI utility and Python library for interacting with Large Language Models, both via remote APIs and models that can be installed and run on your own machine. | GitHub | Docs | False |
| Langchain | A framework for developing applications powered by language models. Supports connecting a language model to sources of context and enabling reasoning. | GitHub | Docs | False |
| LiteLLM | A framework for accessing LLMs and their APIs in the OpenAI format, for drop-in replacement and other convenient integrations. | GitHub | Docs | False |
| Logfire | An observability platform and a set of tools for collecting structured logs. For LLMs, this provides a way to track input prompts, parameters, and generated outputs. | GitHub | Docs | False |
| Ollama | A framework for running LLMs locally, with GPU support. | GitHub | Site | False |
| OntoGPT | A tool for linking unstructured data to structured vocabularies with consistent identifiers. Uses SPIRES and TALISMAN methods. | GitHub | Docs | True |
| Ontology Access Toolkit (OAK) | Python library for common ontology operations over a variety of backends. OAK has its own TextAnnotator but it’s very simple. OntoGPT uses OAK for term retrieval, labeling, mapping, etc. | GitHub | Docs | True |
| Phenomics Assistant | An AI chatbot with access to the Monarch Initiative biomedical knowledgebase. See demo at https://phenomics-assistant.streamlit.app/ | GitHub | bioRxiv | True |
| Pydantic.ai | A Python agent framework for working with LLMs. | GitHub | Docs | False |
Data Preparation and Modeling Tools¶
| Name | Description | Repo | Docs | Internal |
|---|---|---|---|---|
| LinkML | A modeling language and framework for describing, working with, and validating data in a variety of formats. OntoGPT uses LinkML to define extraction schemas. | GitHub | Docs; draft | True |
| PaperQA | A package for doing high-accuracy retrieval augmented generation (RAG) on PDFs or text files, with a focus on the scientific literature. | GitHub | arXiv | False |
| phenopacket2prompt | A tool for transforming data in the GA4GH Phenopacket standard into LLM-ready prompts. | GitHub | Docs | True |
Evaluation Tools¶
| Name | Description | Repo | Docs | Internal |
|---|---|---|---|---|
| DeepEval | An LLM evaluation framework built around unit tests. | GitHub | Docs | False |
| llm-matrix | A tool for running, evaluating, and comparing different language models across a matrix of hyperparameters. It allows systematic testing of models for accuracy, consistency, and performance on specific tasks. | GitHub | True | |
| LangSmith | A framework for building LLM applications, including evaluations. Can be used with or without LangChain. | GitHub | Docs | False |
| Metacoder | A unified interface for command line AI coding assistants (claude code, gemini-cli, codex, goose, qwen-coder). | GitHub | Docs | True |
Visualization and Interface Building Tools¶
| Name | Description | Repo | Docs | Internal |
|---|---|---|---|---|
| Gradio | Tools for building an interface for Python projects, including those interfacing with LLMs. | GitHub | Docs | False |
| Streamlit | A framework for building web apps. | GitHub | Docs | False |
Agentic Coding and Ontology Development Tools¶
| Name | Description | Repo | Docs | Internal |
|---|---|---|---|---|
| aider | An agentic coding tool capable of working with a variety of LLM APIs and local models. | GitHub | Docs | False |
| Claude Code | Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows - all through natural language commands. | GitHub | Docs | False |
| ODK-AI | A Docker container that extends the ODK image to use Claude Code and other LLM-powered tools with ontologies. It is designed to be executed either interactively or in "headless" mode. | GitHub | Docs | True |
| Goose | An open source AI agent for automating coding tasks. Supports a variety of LLMs. Can be used through an app or a CLI. | GitHub | Docs | False |
| Roo Code | Roo Code is an AI-powered autonomous coding agent that lives in your editor. | GitHub | Docs | False |
| Cherry Studio | Cherry Studio is a desktop client that supports multiple LLM providers, available on Windows, Mac and Linux. | GitHub | False | |
| dragon-ai-agent | An automated AI agent specifically designed to assist with ontology curation and maintenance tasks. | GitHub | Docs | True |
| github-ai-integrations | A Copier Template to augment github repos with AI capabilities | GitHub | Docs | True |
Model Context Protocol (MCP) Tools¶
| Name | Description | Repo | Docs | Internal |
|---|---|---|---|---|
| landuse-mcp | A Model Context Protocol (MCP) server for retrieving land use data for given geographical locations using the National Land Cover Database (NLCD) and other geospatial datasets. | GitHub | True | |
| oak-mcp | A model context protocol (MCP) to help agents interact with ontologies and the ontology access kit | GitHub | True | |
| ols-mcp | A Model Context Protocol (MCP) server for retrieving information from the Ontology Lookup Service (OLS). | GitHub | True | |
| artl-mcp | An MCP for retrieving scientific literature metadata and content using PMIDs, DOIs, and other identifiers. | GitHub | True | |
| fitness-mcp | A FastMCP server for analyzing fitness data from barcoded Agrobacterium mutant libraries grown in mixed cultures across different conditions. | GitHub | True |
¶
Tool-specific Guides¶
Accessing Monarch data with LLMs¶
Using LLMs with the Ontology Access Kit¶
- OAK documentation: https://incatools.github.io/ontology-access-kit/howtos/use-llms
OntoGPT¶
CurateGPT¶
Guides to Using LLMs for Ontology Curation and Semantic Engineering¶
- https://ai4curation.github.io/aidocs/ - a growing collection of how-tos and reference guides for curators and maintainers of knowledge bases to integrate AI into their workflows.
- AI-assisted ontology editing workflows, Part 1
- OBO Academy article on Leveraging ChatGPT for ontology curation: https://oboacademy.github.io/obook/lesson/chatgpt-ontology-curation/
- Introduction to developing agentic workflows for semantic engineers
- Tutorial materials here: https://github.com/cmungall/agent-tutorial
- Applications of Agentic AI for the GO
dragon-ai-agent¶
- Setup
- https://ai4curation.github.io/aidocs/how-tos/set-up-github-actions/
- Examples of use
- An issue and PR in the MONDO repository
- An issue and PR in GO
MCPs¶
- Where to find MCPs?
- MCP servers | Glama
- GitHub - modelcontextprotocol/servers: Model Context Protocol Servers
- MCP Registry Registry
- Security when using MCPs
- The Vulnerable MCP Project
- What 17,845 GitHub Repos Taught Us About Malicious MCP Servers \~ VirusTotal Blog
- Model Context Protocol (MCP): Understanding security risks and controls
Using Open Models¶
Some LLMs may be used on local hardware (e.g., your own laptop) rather than through a remote API. This will not be possible for the largest models and may be slow to produce results with even moderately sized models, but with less cost as compared to commercial services and greater flexibility in the availability of models.
The Ollama framework is a good place to start.
Models may be retrieved from the popular HuggingFace platform.
Other options:
- The Datasette LLM library has multiple plugins available for running local models. See the plugin directory.
- LangChain is also capable of running local models.
- It can work with Ollama, among other frameworks.
- Example: Running a Hugging Face Large Language Model (LLM) locally on my laptop | Mark Needham
- Video version: Running a Hugging Face LLM on your laptop
- See also: openplayground (a GUI for chat completion with many different LLMs, both open and not)
Using LBNL CBORG¶
CBORG is a service provided by the Berkeley Lab’s IT Division and Science IT staff to provide access to AI models. If you work for LBNL, you may use CBORG. Models may be accessed in three ways:
- An in-browser chat interface (https://chat.cborg.lbl.gov/)
- Over an API
- FAQ: https://cborg.lbl.gov/api_faq/
- Full documentation: https://api.cborg.lbl.gov/
- Through your favorite agentic platform (https://cborg.lbl.gov/tools_ai_101/)
Get a CBORG API Key¶
Need a CBORG API key? See this page.
Or, for more detail, follow these instructions:
- Go here: https://chat.cborg.lbl.gov/login
- Login with your LDAP credentials (your LBL username and password)
- Some people report that they had to refresh the page (or go through the sign-in process) a few times before it worked.
- Request a CBORG API key: https://cborg.lbl.gov/api_request/
- You may only see the key immediately after requesting it!
- Save it in a password safe.
- If you lose your key, you can contact CBORG support and ask for a new one
- Email: ScienceIT
- CBORG Google Group: https://chat.google.com/room/AAQAqGsqgfQ?cls=7
Need a supplemental CBORG API key for a specific project with a defined spending limit or timeframe? Use this form.
Using CBORG Models¶
The CBORG API is OpenAI-compatible, which means it can handle requests in much the same way as the OpenAI API does. Tools and applications designed to work with OpenAI models will generally work with CBORG, with the caveat that all models are different and some have different features from others (e.g., functionality for using tools). So, in the absence of more specific instructions, you may be able to get CBORG working with your chosen software by:
- Specifying a new model or API endpoint as OpenAI-compatible
- Providing the API base (https://api.cborg.lbl.gov) and API key (see above)
- Specifying a model name, like “lbl/cborg-chat:latest”
- See the full list here, though you may have to scroll down to see the specific names to pass to the API
CBORG also provides proxy utilities for accessing their API. The immediate benefit of this is convenience: the proxy can automatically provide your API key along with each request. If you’re using the CBORG API from multiple applications, the proxy can also help to manage all the resulting connections. Find it on GitHub here: https://github.com/lbnl-science-it/cborg-client
Managing CBORG Usage¶
View your key budget here: https://api.cborg.lbl.gov/key/manage
Alternatively, use this shell function to get the same information in your terminal: https://gist.github.com/pkalita-lbl/eb9065e03157844ba3130449f0de8433
By default, each user is allocated $50 per month, unless you get additional grant-based funding. (Sierra says this lasts a while.)
Note that the open on-premises models (those with model names preceded by “lbl”) may be used at no monetary cost to you.