▶️ Getting Started¶

Onboarding¶

Ensure you are onboarded to Monarch.

Contact Sarah Gehrke (email: sarah@tislab.org) if not sure
This will get you access to the email list and Slack channels.

API Access¶

🔷 LBNL team: You should use the lab provided CBORG router service to avoid charging LLM usage to Monarch grants. See BBOP onboarding and the section on using CBORG for details.

⚠️ Warning: Commercial APIs are pay-per-use, so please consult Chris if you plan on running large jobs, e.g., querying a single model thousands of times.

OpenAI API¶

Complete this form to receive an OpenAI API key.
You must use the form - no API keys without it!

Necessary Tools¶

Install basic, general purpose tools:

Datasette llm - GitHub - simonw/llm: Access large language models from the command-line
Be familiar with how to use redirect patterns, e.g., you can run a command like cat myfile.py | llm -s "Explain this code"
Ontology Access Kit - GitHub - INCATools/ontology-access-kit

See a larger list of relevant tools here.

Necessary Background Knowledge¶

🔷 Curators: Please see the AI4Curation documents here.

🔷 LBNL team: Review the BLAM workshop resources. These include webinars and exercises for a range of skill levels. Some are focused on LBNL resources like CBORG.

See the Monarch Standard Operating Procedures. Note the AI Guidelines:
Every commit and PR created by an AI agent / tool MUST be created by a dedicated AI user account. Human user accounts MUST NOT be used for AI-generated content.
AI-generated content (especially that is not reviewed by humans) MUST be flagged with appropriate metadata whenever it is asserted (ontologies or KGs). For example, axioms generated by an AI tool MUST be clearly marked as such by (a) providing a reference to the AI tool used and (b) providing an indication THAT the content was generated by an AI tool (e.g. MONDO:AI_GENERATED).
Look over the Tools and Methods and Experiments tables (they are in other tabs of this document) to familiarize yourself with the available resources.
Understand what Retrieval Augmented Generation (RAG) is.
See the original RAG paper: [2005.11401] Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Understand what “AI Slop” means - stay “slop adjacent.” Don’t make me wear boots to review your code.
For example, if the AI writes tests for you, and they all pass (be sure to test them yourself), take 10 minutes to check if you need all the tests it produces. Are the tests adding value to your repo, or with some editing, can you do the same tests with fewer lines of code?
If the AI generates a commit message, edit it to a reasonable set of things you wouldn’t mind reading in a “drive-by” code review.
Understand what “agents” or “agentic” systems mean in the context of LLMs.
In brief, these are systems in which an LLM has access to tools and can use them dynamically to accomplish specific tasks.
Article: Building Effective AI Agents \ Anthropic
Introduction to developing agentic workflows for semantic engineers - OBO Academy tutorial by Chris Mungall, 2025-04-15
- Video
- Tutorial material
Using AI Coding Apps for Ontology Development - OBO Academy tutorial by Chris Mungall, 2025-06-09
- Video
- Tutorial material
Know about protocols for enabling LLMs to work with tools.
MCP (Model Context Protocol)
Google developed a competing standard: Agent2Agent (A2A)
- Announcing the Agent2Agent Protocol (A2A) - Google Developers Blog
Chris’ slides on MCP core concepts: MCP Core Concepts - BERtron slides
Chris’ talk on MCPifying Earth Science data: MCPifying Earth System Data
Standard operating procedure for making our MCPs searchable/findable
- Use github topic “mcp”
- This makes MCPs searchable, e.g. to search for Monarch MCPs:
- https://github.com/search?q=org%3Amonarch-initiative%20topic%3Amcp%20\&type=repositories

For more background information, see the Background Knowledge section.