Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save wiggitywhitney/adbc885891d32cf22846e3d087cfa5d3 to your computer and use it in GitHub Desktop.

Select an option

Save wiggitywhitney/adbc885891d32cf22846e3d087cfa5d3 to your computer and use it in GitHub Desktop.

Making GenAI Observable with OpenTelemetry

Associated Thunder episode: Making GenAI Observable with OpenTelemetry

Making GenAI Observable with OpenTelemetry


What is OpenTelemetry?

OpenTelemetry is a model for how to translate system events into useful data for observability.

OpenTelemetry is a standard, software, and a specification. (opentelemetry.io)

OTel is made up of:

  • APIs
  • SDKs
  • Tools (Collector, etc)
  • Protocol (OTLP) — "glossary"
  • Semantic Conventions

Observability pattern: direct use


OTel and GenAI (so hot right now!)

3 Angles:

  1. Building/training Large Language Models (or any machine learning stuff) — not many doing this

  2. Building GenAI/LLM features in your software — i.e. see new furniture in your own home before buying

  3. Using GenAI/LLM in your coding workflow — i.e. Claude Code, GenAI Agent — very helpful here!


GenAI Semantic Conventions

For example, user-GenAI chat conversations have standardized trace data.

  1. Provides consistent ways to see GenAI data
  2. Use GenAI actions with consistent patterns and rules

Building GenAI-Powered Features: How Can OTel Help?

OTel turns GenAI data into data you can understand and reason about (just like any other data).

Deterministic vs non-deterministic? Just an implementation detail!?

GenAI feature needs user feedback alongside performance data. OTel offers ways to store this feedback in metadata alongside system data.

GenAI success = happy user. As opposed to system success being something like a successful database write.

Key Concepts:

  • Tools — LLMs use to access external systems: MCP servers, plug-ins
  • Agent — LLM using tools in a loop: Claude Code, Cursor, Windsurf, Handmade
  • LLM — Large Language Model

Using GenAI in Your Coding Workflow: How Can OTel Help?

Fundamental Challenge of Doing Observability on GenAI:

There is a lot of implicit knowledge LLMs don't know.

  1. OTel and semantic conventions have been around a long time and have wide adoption. LLMs were trained on this. They know it! Also makes instrumenting easier.

  2. More code and less-understood code in production. OMG! You really need OTel!

  3. Use an MCP server to feed OTel data back into coding agents. Use a vendored one! Or write your own! Easier with OTel!

  4. OTel data piped back into coding env can help to code verification, debugging, and coding agents' "architectural blindness."

AND SO MUCH MORE. THIS IS THE BLEEDING EDGE!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment