Associated Thunder episode: Making GenAI Observable with OpenTelemetry
OpenTelemetry is a model for how to translate system events into useful data for observability.
OpenTelemetry is a standard, software, and a specification. (opentelemetry.io)
OTel is made up of:
- APIs
- SDKs
- Tools (Collector, etc)
- Protocol (OTLP) — "glossary"
- Semantic Conventions
Observability pattern: direct use
-
Building/training Large Language Models (or any machine learning stuff) — not many doing this
-
Building GenAI/LLM features in your software — i.e. see new furniture in your own home before buying
-
Using GenAI/LLM in your coding workflow — i.e. Claude Code, GenAI Agent — very helpful here!
For example, user-GenAI chat conversations have standardized trace data.
- Provides consistent ways to see GenAI data
- Use GenAI actions with consistent patterns and rules
OTel turns GenAI data into data you can understand and reason about (just like any other data).
Deterministic vs non-deterministic? Just an implementation detail!?
GenAI feature needs user feedback alongside performance data. OTel offers ways to store this feedback in metadata alongside system data.
GenAI success = happy user. As opposed to system success being something like a successful database write.
- Tools — LLMs use to access external systems: MCP servers, plug-ins
- Agent — LLM using tools in a loop: Claude Code, Cursor, Windsurf, Handmade
- LLM — Large Language Model
There is a lot of implicit knowledge LLMs don't know.
-
OTel and semantic conventions have been around a long time and have wide adoption. LLMs were trained on this. They know it! Also makes instrumenting easier.
-
More code and less-understood code in production. OMG! You really need OTel!
-
Use an MCP server to feed OTel data back into coding agents. Use a vendored one! Or write your own! Easier with OTel!
-
OTel data piped back into coding env can help to code verification, debugging, and coding agents' "architectural blindness."
AND SO MUCH MORE. THIS IS THE BLEEDING EDGE!
