wiggitywhitney/otel-genai-observability.md

## otel-genai-observability.md

      
    Raw
  

              otel-genai-observability.md
            
          
    Making GenAI Observable with OpenTelemetry

Associated Thunder episode: Making GenAI Observable with OpenTelemetry


What is OpenTelemetry?

OpenTelemetry is a model for how to translate system events into useful data for observability.
OpenTelemetry is a standard, software, and a specification. (opentelemetry.io)
OTel is made up of:

APIs
SDKs
Tools (Collector, etc)
Protocol (OTLP) — "glossary"
Semantic Conventions


OpenTelemetry provides an observability pattern for direct use.


OTel and GenAI (so hot right now!)

3 Angles:


Building/training Large Language Models (or any machine learning stuff) — not many doing this


Building GenAI/LLM features in your software — i.e. see new furniture in your own home before buying


Using GenAI/LLM in your coding workflow — i.e. Claude Code, GenAI Agent (very helpful here!)


GenAI Semantic Conventions

For example, user-GenAI chat conversations have standardized trace data.

Provides consistent ways to see GenAI data
Use GenAI actions with consistent patterns and rules


Building GenAI-Powered Features: How Can OTel Help?

OTel turns GenAI data into data you can understand and reason about (just like any other data).
Deterministic vs non-deterministic? Just an implementation detail!
GenAI features need user feedback alongside performance data.
OTel offers ways to store this feedback in metadata alongside system data.
GenAI success = happy user — as opposed to system success being something like a successful database write.
Key Definitions:


Tools: LLMs use to access external systems — MCP servers, plug-ins
Agent: LLM using tools in a loop — Claude Code, Cursor, Windsurf, Handmade
LLM: Large Language Model


Using GenAI in Your Coding Workflow: How Can OTel Help?

Fundamental Challenge of Doing Observability on GenAI:

There is a lot of implicit knowledge LLMs don't know.


OTel and semantic conventions have been around a long time and have wide adoption — LLMs were trained on this! They know it! Also makes instrumenting easier.


More code and less-understood code in production — OMG! You really need OTel!


Use an MCP server to feed OTel data back into coding agents — use a vendored one! Or write your own! Easier with OTel!


OTel data piped back into coding env can help to code verification, debugging, and coding agents' "architectural blindness"


AND SO MUCH MORE — THIS IS THE BLEEDING EDGE
No results found