Skip to content

Instantly share code, notes, and snippets.

@nagkumar91
Last active March 9, 2026 18:42
Show Gist options
  • Select an option

  • Save nagkumar91/8214d9fc9af1cacd65fd564497d723e5 to your computer and use it in GitHub Desktop.

Select an option

Save nagkumar91/8214d9fc9af1cacd65fd564497d723e5 to your computer and use it in GitHub Desktop.
GenAI Memory Operations — Non-Normative Implementation Spec (companion to open-telemetry/semantic-conventions#3250)

GenAI Memory Operations — Non-Normative Implementation Spec (companion to open-telemetry/semantic-conventions#3250)

GenAI Memory Operations — Non-Normative Implementation Spec (companion to open-telemetry/semantic-conventions#3250)

GenAI Memory Operations — Non-Normative Implementation Spec (companion to open-telemetry/semantic-conventions#3250)

Memory Operations for GenAI (Revised Implementation Proposal)

Why Memory Matters for Observability:

  1. Debugging: Understanding what an agent "remembers" during failures is crucial for root cause analysis.
  2. Performance: Memory retrieval latency directly impacts agent response times.
  3. Privacy/Compliance: Tracking what's stored helps with data retention compliance.
  4. Cost Optimization: Memory storage and retrieval operations have measurable costs.

2. Proposed Semantic Conventions

2.1 Revised Operation Names

Add the following values to the gen_ai.operation.name enum:

- id: search_memory
  value: "search_memory"
  brief: "Search/query memories"
  stability: development

- id: update_memory
  value: "update_memory"
  brief: "Create or update (upsert) memory items"
  stability: development

- id: delete_memory
  value: "delete_memory"
  brief: "Delete memory items"
  stability: development

- id: create_memory_store
  value: "create_memory_store"
  brief: "Create/initialize a memory store"
  stability: development

- id: delete_memory_store
  value: "delete_memory_store"
  brief: "Delete/deprovision a memory store"
  stability: development

Notes:

  • store_memory is removed; use update_memory for both create and update to avoid ambiguity.
  • update_memory is an upsert. If the underlying system distinguishes create vs update, instrumentations MAY add a system-specific attribute to capture that outcome, but SHOULD keep gen_ai.operation.name as update_memory.

2.2 Revised Memory Attributes

- id: gen_ai.memory.store.id
  stability: development
  type: string
  brief: The unique identifier of the memory store.
  examples: ["ms_abc123", "user-preferences-store"]

- id: gen_ai.memory.store.name
  stability: development
  type: string
  brief: Human-readable name of the memory store.
  examples: ["Customer Support Memory", "Shopping Preferences"]

- id: gen_ai.memory.record.id
  stability: development
  type: string
  brief: The unique identifier of a memory item.
  examples: ["mem_5j66UpCpwteGg4YSxUnt7lPY"]

- id: gen_ai.memory.scope
  stability: development
  type:
    members:
      - id: user
        value: "user"
        brief: "Scoped to a specific user"
        stability: development
      - id: conversation
        value: "conversation"
        brief: "Scoped to a conversation/thread. Context within a single conversation."
        stability: development
      - id: agent
        value: "agent"
        brief: "Scoped to a specific agent"
        stability: development
      - id: team
        value: "team"
        brief: "Shared across a team of agents"
        stability: development
  brief: The scope of the memory store or memory operation.
  examples: ["user", "conversation", "agent"]

- id: gen_ai.memory.record.content
  stability: development
  type: any
  brief: The content/value of the memory item.
  note: |
    > [!WARNING]
    > This attribute may contain sensitive information including user/PII data.
    >
    > Instrumentations SHOULD NOT capture this by default.
  examples:
    - '{"preference": "dark_mode", "value": true}'

- id: gen_ai.memory.query
  stability: development
  type: string
  brief: The search query used to retrieve memories.
  note: |
    > [!WARNING]
    > This attribute may contain sensitive information.
  examples: ["user dietary preferences", "past flight bookings"]


- id: gen_ai.memory.search.result.count
  stability: development
  type: int
  brief: Number of memory items returned from a search operation.
  examples: [3, 10]

- id: gen_ai.memory.search.similarity.threshold
  stability: development
  type: double
  brief: Minimum similarity score threshold used for memory search.
  examples: [0.7, 0.85]

- id: gen_ai.memory.expiration_date
  stability: development
  type: string
  brief: Expiration date for the memory in ISO 8601 format.
  examples: ["2025-12-31", "2026-01-15T00:00:00Z"]

- id: gen_ai.memory.importance
  stability: development
  type: double
  brief: Importance score of the memory (0.0 to 1.0).
  examples: [0.8, 0.95]

2.3 Memory Spans

Create Memory Store Span

- id: span.gen_ai.create_memory_store.client
  type: span
  stability: development
  span_kind: client
  brief: >
    Describes creation/initialization of a memory store.
  note: |
    The `gen_ai.operation.name` SHOULD be `create_memory_store`.

    **Span name** SHOULD be `create_memory_store {gen_ai.memory.store.name}`
    or `create_memory_store` if store name is not available.
  attributes:
    - ref: gen_ai.operation.name
      requirement_level: required
    - ref: gen_ai.provider.name
      requirement_level: required
    - ref: gen_ai.memory.store.id
      requirement_level:
        conditionally_required: when returned by the operation
    - ref: gen_ai.memory.store.name
      requirement_level: recommended
    - ref: gen_ai.memory.scope
      requirement_level: required
    - ref: error.type
      requirement_level:
        conditionally_required: if the operation ended in an error

Search Memory Span

- id: span.gen_ai.search_memory.client
  type: span
  stability: development
  span_kind: client
  brief: >
    Describes a memory search/retrieval operation - querying a memory store
    for relevant memories.
  note: |
    The `gen_ai.operation.name` SHOULD be `search_memory`.

    **Span name** SHOULD be `search_memory {gen_ai.memory.store.name}`
    or `search_memory` if store name is not available.
  attributes:
    - ref: gen_ai.operation.name
      requirement_level: required
    - ref: gen_ai.provider.name
      requirement_level: required
    - ref: gen_ai.memory.store.id
      requirement_level:
        conditionally_required: if applicable
    - ref: gen_ai.memory.store.name
      requirement_level: recommended
    - ref: gen_ai.memory.query
      requirement_level: opt_in
    - ref: gen_ai.memory.search.result.count
      requirement_level: recommended
    - ref: gen_ai.memory.search.similarity.threshold
      requirement_level:
        conditionally_required: when similarity filtering is used
    - ref: gen_ai.agent.id
      requirement_level:
        conditionally_required: when searching agent-scoped memory
    - ref: gen_ai.conversation.id
      requirement_level:
        conditionally_required: when searching conversation-scoped memory
    - ref: error.type
      requirement_level:
        conditionally_required: if the operation ended in an error

Update Memory Span (Upsert)

- id: span.gen_ai.update_memory.client
  type: span
  stability: development
  span_kind: client
  brief: >
    Describes a memory update operation (upsert) - creating or modifying
    memory items in a memory store.
  note: |
    The `gen_ai.operation.name` SHOULD be `update_memory`.

    This operation is an upsert to avoid ambiguity between create vs update.

    **Span name** SHOULD be `update_memory {gen_ai.memory.store.name}`
    or `update_memory` if store name is not available.
  attributes:
    - ref: gen_ai.operation.name
      requirement_level: required
    - ref: gen_ai.provider.name
      requirement_level: required
    - ref: gen_ai.memory.store.id
      requirement_level:
        conditionally_required: if applicable
    - ref: gen_ai.memory.store.name
      requirement_level: recommended
    - ref: gen_ai.memory.record.id
      requirement_level:
        conditionally_required: when available (provided or returned)
    - ref: gen_ai.memory.record.content
      requirement_level: opt_in
    - ref: gen_ai.memory.expiration_date
      requirement_level:
        conditionally_required: if expiration is set
    - ref: gen_ai.memory.importance
      requirement_level: recommended
    - ref: gen_ai.agent.id
      requirement_level:
        conditionally_required: when operating on agent-scoped memory
    - ref: gen_ai.conversation.id
      requirement_level:
        conditionally_required: when operating on conversation-scoped memory
    - ref: error.type
      requirement_level:
        conditionally_required: if the operation ended in an error

Delete Memory Span

- id: span.gen_ai.delete_memory.client
  type: span
  stability: development
  span_kind: client
  brief: >
    Describes a memory deletion operation - removing one or more memory items.
  note: |
    The `gen_ai.operation.name` SHOULD be `delete_memory`.

    **Span name** SHOULD be `delete_memory {gen_ai.memory.store.name}`
    or `delete_memory` if store name is not available.

    Deletion semantics SHOULD be interpreted as follows:

    - If `gen_ai.memory.record.id` is set, delete a specific memory item.
    - If `gen_ai.memory.record.id` is not set, delete all memory items in the specified
  attributes:
    - ref: gen_ai.operation.name
      requirement_level: required
    - ref: gen_ai.provider.name
      requirement_level: required
    - ref: gen_ai.memory.store.id
      requirement_level:
        conditionally_required: if applicable
    - ref: gen_ai.memory.store.name
      requirement_level: recommended
    - ref: gen_ai.memory.scope
      requirement_level: required
    - ref: gen_ai.memory.record.id
      requirement_level:
        conditionally_required: when deleting a specific memory item
    - ref: gen_ai.agent.id
      requirement_level:
        conditionally_required: when deleting agent-scoped memory
    - ref: gen_ai.conversation.id
      requirement_level:
        conditionally_required: when deleting conversation-scoped memory
    - ref: error.type
      requirement_level:
        conditionally_required: if the operation ended in an error

Delete Memory Store Span

- id: span.gen_ai.delete_memory_store.client
  type: span
  stability: development
  span_kind: client
  brief: >
    Describes deletion/deprovisioning of a memory store.
  note: |
    The `gen_ai.operation.name` SHOULD be `delete_memory_store`.

    **Span name** SHOULD be `delete_memory_store {gen_ai.memory.store.name}`
    or `delete_memory_store` if store name is not available.
  attributes:
    - ref: gen_ai.operation.name
      requirement_level: required
    - ref: gen_ai.provider.name
      requirement_level: required
    - ref: gen_ai.memory.store.id
      requirement_level:
        conditionally_required: if applicable
    - ref: gen_ai.memory.store.name
      requirement_level: recommended
    - ref: error.type
      requirement_level:
        conditionally_required: if the operation ended in an error

3. Real-World Stories: Memory Spans in Action

The following 6 stories demonstrate how the proposed memory spans and attributes work in real-world scenarios. Each story shows the exact trace hierarchy and span attributes that instrumentations would produce.

Story 1: Customer Support Agent

Scenario: Sarah contacts TechCorp support about a billing issue. The support agent maintains conversation context within a session while retrieving relevant information from past interactions.

Trace:

invoke_agent "CustomerSupportBot"                                    (1500ms)
├── create_memory_store "session-context"                            (15ms)
│   ├── gen_ai.operation.name: "create_memory_store"
│   ├── gen_ai.provider.name: "pinecone"
│   ├── gen_ai.memory.store.id: "store_session_abc123"
│   ├── gen_ai.memory.store.name: "session-context"
│   ├── gen_ai.memory.scope: "conversation"
│   └── gen_ai.conversation.id: "conv_session_abc123"
│
├── search_memory "user-history"                                     (45ms)
│   ├── gen_ai.operation.name: "search_memory"
│   ├── gen_ai.provider.name: "pinecone"
│   ├── gen_ai.memory.store.id: "store_user_sarah_123_history"
│   ├── gen_ai.memory.store.name: "user-history"
│   ├── gen_ai.memory.query.text: "billing issue duplicate charge"        [opt-in]
│   ├── gen_ai.memory.search.similarity.threshold: 0.7
│   ├── gen_ai.memory.search.result.count: 3
│   └── gen_ai.conversation.id: "conv_session_abc123"
│
├── chat "gpt-4"                                                     (1200ms)
│   ├── gen_ai.operation.name: "chat"
│   ├── gen_ai.usage.input_tokens: 1500
│   └── gen_ai.usage.output_tokens: 250
│
├── update_memory "session-context"                                  (20ms)
│   ├── gen_ai.operation.name: "update_memory"
│   ├── gen_ai.provider.name: "pinecone"
│   ├── gen_ai.memory.store.name: "session-context"
│   ├── gen_ai.memory.record.id: "turn_001"
│   ├── gen_ai.memory.expiration_date: "2026-02-25T17:30:00Z"       (24h TTL)
│   └── gen_ai.conversation.id: "conv_session_abc123"
│
│   ... (additional conversation turns: search → chat → update) ...
│
└── delete_memory "session-context"                                  (25ms)
    ├── gen_ai.operation.name: "delete_memory"
    ├── gen_ai.provider.name: "pinecone"
    ├── gen_ai.memory.store.id: "store_session_abc123"
    ├── gen_ai.memory.store.name: "session-context"
    ├── gen_ai.memory.scope: "conversation"
    └── gen_ai.conversation.id: "conv_session_abc123"

Key spans demonstrated: create_memory_store, search_memory, update_memory, delete_memory

Why observability matters: If Sarah says "I already told you my account number" and the agent asks again, engineers can check the search_memory span — was result_count = 0? Was similarity.threshold too high? Did the session memory expire (expiration_date)?


Story 2: Personal Shopping Assistant

Scenario: Mike uses ShopSmart, which learns his preferences over time. He explicitly states he prefers sustainable products, the system infers he likes minimalist designs from browsing, and eventually he exercises GDPR deletion rights.

Trace (preference learning):

invoke_agent "ShoppingAssistant"                                     (2000ms)
├── create_memory_store "user-preferences"                           (20ms)
│   ├── gen_ai.operation.name: "create_memory_store"
│   ├── gen_ai.provider.name: "pinecone"
│   ├── gen_ai.memory.store.id: "store_user_mike_456_prefs"
│   ├── gen_ai.memory.store.name: "user-preferences"
│   ├── gen_ai.memory.scope: "user"
│
├── update_memory "user-preferences"                                 (30ms)
│   ├── gen_ai.operation.name: "update_memory"
│   ├── gen_ai.memory.store.name: "user-preferences"
│   ├── gen_ai.memory.record.id: "pref_sustainable_001"
│   ├── gen_ai.memory.importance: 0.9                                (explicit preference)
│   └── gen_ai.memory.record.content: '{"preference": "sustainable_products"}' [opt-in]
│
├── update_memory "user-preferences"                                 (25ms)
│   ├── gen_ai.operation.name: "update_memory"
│   ├── gen_ai.memory.store.name: "user-preferences"
│   ├── gen_ai.memory.record.id: "pref_minimalist_002"
│   └── gen_ai.memory.importance: 0.75
│
├── search_memory "user-preferences"                                 (40ms)
│   ├── gen_ai.operation.name: "search_memory"
│   ├── gen_ai.memory.store.name: "user-preferences"
│   ├── gen_ai.memory.query.text: "laptop recommendations"               [opt-in]
│   ├── gen_ai.memory.search.similarity.threshold: 0.6
│   └── gen_ai.memory.search.result.count: 5
│
├── chat "gpt-4"                                                     (1100ms)
│
├── update_memory "user-preferences"                                 (25ms)
│   ├── gen_ai.operation.name: "update_memory"
│   ├── gen_ai.memory.record.id: "pref_sustainable_001"
│   └── gen_ai.memory.importance: 0.1                                (user downgraded)
│
└── delete_memory "user-preferences"                                 (35ms)
    ├── gen_ai.operation.name: "delete_memory"                       (GDPR request)
    ├── gen_ai.memory.store.id: "store_user_mike_456_prefs"
    ├── gen_ai.memory.store.name: "user-preferences"
    └── gen_ai.memory.scope: "user"                                  (bulk delete all)

Key spans demonstrated: create_memory_store, update_memory (with importance + merge strategy), search_memory (with similarity threshold), delete_memory (GDPR bulk)

Why observability matters: When Mike says "Why did you recommend leather boots? I said I prefer sustainable products!" — engineers can trace the update_memory span that lowered importance to 0.1 for pref_sustainable_001, and the search_memory span to see if the threshold filtered it out.


Story 3: Multi-Agent Research Crew

Scenario: ResearchCo deploys a team of specialized agents — Researcher, Analyst, Writer — to produce an EV market research report. They share findings via team-scoped memory while maintaining private procedural memory.

Trace:

invoke_agent "ResearchCrew"                                          (8500ms)
│
├── create_memory_store "ev-research-team"                           (20ms)
│   ├── gen_ai.operation.name: "create_memory_store"
│   ├── gen_ai.provider.name: "milvus"
│   ├── gen_ai.memory.store.id: "store_team_ev_research_2025"
│   ├── gen_ai.memory.store.name: "ev-research-team"
│   ├── gen_ai.memory.scope: "team"
│
├── invoke_agent "Researcher"                                        (3000ms)
│   ├── chat "gpt-4"                                                (1800ms)
│   │
│   ├── update_memory "ev-research-team"                             (30ms)
│   │   ├── gen_ai.operation.name: "update_memory"
│   │   ├── gen_ai.memory.store.name: "ev-research-team"
│   │   ├── gen_ai.memory.record.id: "finding_market_size_001"
│   │   └── gen_ai.agent.id: "researcher_agent"                     (attribution)
│   │
│   └── update_memory "ev-research-team"                             (25ms)
│       ├── gen_ai.operation.name: "update_memory"
│       ├── gen_ai.memory.record.id: "finding_regional_data_002"
│       └── gen_ai.agent.id: "researcher_agent"
│
├── invoke_agent "Analyst"                                           (2500ms)
│   ├── create_memory_store "analyst-procedures"                     (15ms)
│   │   ├── gen_ai.operation.name: "create_memory_store"
│   │   ├── gen_ai.memory.store.name: "analyst-procedures"
│   │   ├── gen_ai.memory.scope: "agent"                            (private)
│   │
│   ├── update_memory "analyst-procedures"                           (20ms)
│   │   ├── gen_ai.operation.name: "update_memory"
│   │   ├── gen_ai.memory.store.name: "analyst-procedures"
│   │   ├── gen_ai.memory.scope: "agent"
│   │   └── gen_ai.agent.id: "analyst_agent"
│   │
│   ├── chat "gpt-4"                                                (1500ms)
│   │
│   └── update_memory "ev-research-team"                             (30ms)
│       ├── gen_ai.operation.name: "update_memory"
│       ├── gen_ai.memory.store.name: "ev-research-team"
│       ├── gen_ai.memory.record.id: "analysis_growth_projection_003"
│       └── gen_ai.agent.id: "analyst_agent"
│
└── invoke_agent "Writer"                                            (2800ms)
    ├── search_memory "ev-research-team"                             (50ms)
    │   ├── gen_ai.operation.name: "search_memory"
    │   ├── gen_ai.memory.store.name: "ev-research-team"
    │   ├── gen_ai.memory.query.text: "EV market size growth projections" [opt-in]
    │   ├── gen_ai.memory.search.result.count: 8
    │   └── gen_ai.agent.id: "writer_agent"
    │
    └── chat "gpt-4"                                                 (2500ms)

Key spans demonstrated: create_memory_store (team + agent scope), update_memory (with gen_ai.agent.id attribution and append strategy), search_memory (cross-agent retrieval)

Why observability matters: If the final report is missing the Analyst's growth projections, engineers can trace: Did analyst_agent actually call update_memory with strategy: append? Did writer_agent's search_memory return the expected 8 results? The gen_ai.agent.id attribute enables attribution — which agent contributed what.


Story 4: Enterprise Multi-Tenant SaaS

Scenario: CloudAssist is a B2B SaaS platform providing AI assistants to enterprise customers. Each tenant (ACME Corp, TechCo) has isolated data via namespaces, but all share access to global product documentation.

Trace (tenant onboarding → usage → offboarding):

# Tenant Onboarding
create_memory_store "tenant-store"                                   (50ms)
├── gen_ai.operation.name: "create_memory_store"
├── gen_ai.provider.name: "pinecone"
├── gen_ai.memory.store.id: "store_tenant_acme"
├── gen_ai.memory.store.name: "tenant-store"
├── gen_ai.memory.scope: "global"
# Tenant employee stores data
update_memory "tenant-store"                                         (25ms)
├── gen_ai.operation.name: "update_memory"
├── gen_ai.memory.store.name: "tenant-store"
├── gen_ai.memory.record.id: "acme_q4_projection_001"
# Tenant-scoped search (returns only ACME data)
search_memory "tenant-store"                                         (40ms)
├── gen_ai.operation.name: "search_memory"
├── gen_ai.memory.store.name: "tenant-store"
├── gen_ai.memory.query.text: "Q4 revenue projections"                   [opt-in]
└── gen_ai.memory.search.result.count: 12

# Global search (shared product docs, no namespace)
search_memory "global-docs"                                          (35ms)
├── gen_ai.operation.name: "search_memory"
├── gen_ai.memory.store.name: "global-docs"
├── gen_ai.memory.query.text: "CloudAssist API rate limits"              [opt-in]
└── gen_ai.memory.search.result.count: 3                             (no namespace = global)

# Tenant offboarding (complete removal)
delete_memory_store "tenant-store"                                   (100ms)
├── gen_ai.operation.name: "delete_memory_store"
├── gen_ai.memory.store.id: "store_tenant_acme"
├── gen_ai.memory.store.name: "tenant-store"

Key spans demonstrated: create_memory_store (namespaced), update_memory, search_memory (namespaced vs global), delete_memory_store (tenant offboarding)

Why observability matters: To verify tenant data isolation, query traces for search_memory spans: a search with namespace: "tenant_acme" should never return results from tenant_techco. The search.result.count across namespaces provides isolation verification. During offboarding, the delete_memory_store span confirms complete data removal.


Story 5: Compliance Audit & Debugging

Scenario: Three real-world debugging and compliance scenarios that demonstrate WHY memory observability matters.

Scenario A — Agent "forgot" context (debugging):

invoke_agent "CustomerSupportBot"                                    (2000ms)
└── search_memory "conversation-history"                             (45ms)
    ├── gen_ai.operation.name: "search_memory"
    ├── gen_ai.memory.store.name: "conversation-history"
    ├── gen_ai.memory.search.result.count: 0                         ← BUG: No results!
    ├── gen_ai.memory.search.similarity.threshold: 0.95              ← Root cause: too strict
    └── gen_ai.conversation.id: "conv_xyz789"

Fix: Lower similarity.threshold from 0.95 to 0.7. Relevant memories scored 0.7–0.85 but were filtered out.

Scenario B — Compliance audit (who accessed what):

-- Query traces by conversation_id to get full audit trail
SELECT timestamp, operation_name, agent_id, memory_store_id, namespace, result_count
FROM traces
WHERE conversation_id = 'conv_audit_12345'
  AND namespace = 'ns_user_sarah_123'
ORDER BY timestamp;

Scenario C — Slow responses (performance):

invoke_agent "ShoppingAssistant"                                     (5200ms)
├── search_memory "user-preferences"                                 (850ms)  ✓
├── search_memory "product-catalog"                                  (3800ms) ✗ SLOW
│   ├── gen_ai.memory.search.result.count: 50000                     ← Too many results!
│   └── (no similarity.threshold set)                                ← Missing filter!
└── chat "gpt-4"                                                     (450ms)  ✓

Fix: Add similarity.threshold: 0.8 and limit results. Latency drops from 3800ms to 200ms.

Key insight: The search.result.count and similarity.threshold attributes on search_memory spans enable root-cause analysis for all three scenarios without any code changes.


Story 6: GDPR Data Lifecycle (Right to Be Forgotten)

Scenario: Alex is a DataAware user who exercises various GDPR deletion rights — from selective item deletion to complete account removal.

Trace (cascading deletion):

# Phase 1: Selective deletion (single item)
delete_memory "user-preferences"                                     (20ms)
├── gen_ai.operation.name: "delete_memory"
├── gen_ai.memory.store.id: "store_user_alex_789_prefs"
├── gen_ai.memory.store.name: "user-preferences"
├── gen_ai.memory.record.id: "pref_embarrassing_001"                        (specific item)
└── gen_ai.memory.scope: "user"

# Phase 2: Delete by scope (all conversation history)
delete_memory "conversation-history"                                 (35ms)
├── gen_ai.operation.name: "delete_memory"
├── gen_ai.memory.store.id: "store_user_alex_789_history"
├── gen_ai.memory.store.name: "conversation-history"
└── gen_ai.memory.scope: "user"

# Phase 3: Bulk delete all items (scope-based)
delete_memory "user-preferences"                                     (50ms)
├── gen_ai.operation.name: "delete_memory"
├── gen_ai.memory.store.id: "store_user_alex_789_prefs"
└── gen_ai.memory.scope: "user"                                      (all items in scope)

delete_memory "conversation-history"                                 (45ms)
├── gen_ai.operation.name: "delete_memory"
├── gen_ai.memory.store.id: "store_user_alex_789_history"
└── gen_ai.memory.scope: "user"

# Phase 4: Delete the stores themselves (complete removal)
delete_memory_store "user-preferences"                               (30ms)
├── gen_ai.operation.name: "delete_memory_store"
├── gen_ai.memory.store.id: "store_user_alex_789_prefs"
└── gen_ai.memory.store.name: "user-preferences"

delete_memory_store "conversation-history"                           (25ms)
├── gen_ai.operation.name: "delete_memory_store"
├── gen_ai.memory.store.id: "store_user_alex_789_history"
└── gen_ai.memory.store.name: "conversation-history"

delete_memory_store "personal-data"                                  (25ms)
├── gen_ai.operation.name: "delete_memory_store"
├── gen_ai.memory.store.id: "store_user_alex_789_personal"
└── gen_ai.memory.store.name: "personal-data"

Key spans demonstrated: delete_memory (by ID, by scope) and delete_memory_store (complete removal)

Why observability matters: GDPR requires proving data was actually deleted. The trace provides:

  • Selective deletion: gen_ai.memory.record.id identifies the exact item removed
  • Scope-based deletion: gen_ai.memory.scope: "user" confirms all user data was removed
  • Store deletion: delete_memory_store spans confirm complete deprovisioning
  • Audit trail: Timestamps on all spans provide the compliance timeline

Span Coverage Summary

Story create_memory_store search_memory update_memory delete_memory delete_memory_store
1. Customer Support ✅ session scope ✅ similarity threshold, long_term ✅ expiration_date, short_term ✅ session cleanup
2. Shopping Assistant ✅ user scope ✅ similarity threshold ✅ importance, merge strategy ✅ GDPR bulk (scope)
3. Multi-Agent Research ✅ team + agent scope ✅ cross-agent retrieval ✅ agent attribution, append strategy
4. Multi-Tenant SaaS ✅ namespace isolation ✅ namespaced + global ✅ namespaced ✅ tenant offboarding
5. Compliance & Debug ✅ threshold debugging, audit trail ✅ audit trail
6. GDPR Lifecycle ✅ by ID, by scope ✅ complete removal

Key Attribute Coverage Across Stories

Attribute Stories Example Values
gen_ai.memory.store.id 1–6 "store_session_abc123", "store_tenant_acme"
gen_ai.memory.store.name 1–6 "session-context", "user-preferences", "ev-research-team"
gen_ai.memory.record.id 1,2,3,5,6 "turn_001", "pref_sustainable_001", "finding_market_size_001"
gen_ai.memory.scope 1–4,6 "conversation", "user", "agent", "team"
gen_ai.memory.query.text 1,2,3,4,5 "billing issue", "laptop recommendations" (opt-in)
gen_ai.memory.record.content 2 '{"preference": "sustainable_products"}' (opt-in)
gen_ai.memory.importance 2 0.9, 0.75, 0.1
gen_ai.memory.expiration_date 1 "2026-02-25T17:30:00Z"
gen_ai.memory.search.result.count 1–5 3, 5, 8, 12, 50000
gen_ai.memory.search.similarity.threshold 1,2,5 0.7, 0.6, 0.95
gen_ai.agent.id 3,5 "researcher_agent", "analyst_agent", "writer_agent"
gen_ai.conversation.id 1,5 "conv_session_abc123", "conv_xyz789"

4. LangChain Framework Instrumentation Mappings

This section shows how popular LangChain memory classes map to the proposed semantic conventions, providing guidance for instrumentation authors.

LangChain Memory Class → Span Mapping

LangChain Class gen_ai.operation.name gen_ai.memory.scope Notes
ConversationBufferMemory.save_context() update_memory session Each turn stored with conversation context
ConversationBufferMemory.load_memory_variables() search_memory session Retrieves conversation history
ConversationBufferMemory.clear() delete_memory session Clears session memory
VectorStoreRetrieverMemory search_memory user Similarity-based retrieval with similarity.threshold
EntityMemory.save_context() update_memory user Extracts entities; maps to importance scoring
EntityMemory.load_memory_variables() search_memory user Entity lookup with similarity.threshold
ConversationSummaryMemory.save_context() update_memory session Maps to update.strategy: merge
Shared VectorStore (multi-agent) search_memory / update_memory team Use gen_ai.agent.id for attribution

Example: Instrumenting ConversationBufferMemory.save_context()

# What LangChain does internally:
memory.save_context(
    inputs={"input": "What's my order status?"},
    outputs={"output": "Your order #1234 shipped yesterday."}
)

# What the instrumentation emits:
# Span: update_memory "session-context"
#   gen_ai.operation.name: "update_memory"
#   gen_ai.memory.store.name: "session-context"
#   gen_ai.memory.scope: "conversation"
#   gen_ai.memory.record.id: "turn_001"
#   gen_ai.conversation.id: "conv_abc123"
#   gen_ai.memory.expiration_date: "2026-02-25T18:00:00Z"

Example: Instrumenting EntityMemory with Importance

# EntityMemory extracts entities and scores them
# High importance = explicit user preference
# Medium importance = inferred from behavior

# Span: update_memory "user-preferences"
#   gen_ai.operation.name: "update_memory"
#   gen_ai.memory.store.name: "user-preferences"
#   gen_ai.memory.scope: "user"
#   gen_ai.memory.importance: 0.9
#   gen_ai.memory.record.content: '{"entity": "sustainability", "preference": true}'  [opt-in]

Example: Multi-Agent Shared VectorStore

# In CrewAI/LangGraph patterns, agents share a VectorStore backend.
# Each agent's writes are attributed via gen_ai.agent.id.

# Agent "researcher" writes to shared memory:
# Span: update_memory "team-knowledge"
#   gen_ai.operation.name: "update_memory"
#   gen_ai.memory.store.name: "team-knowledge"
#   gen_ai.memory.scope: "team"
#   gen_ai.agent.id: "researcher_agent"

# Agent "writer" reads from shared memory:
# Span: search_memory "team-knowledge"
#   gen_ai.operation.name: "search_memory"
#   gen_ai.memory.store.name: "team-knowledge"
#   gen_ai.memory.scope: "team"
#   gen_ai.memory.search.result.count: 8
#   gen_ai.agent.id: "writer_agent"

Other Framework Mappings

Framework Memory API gen_ai.operation.name
Mem0 client.add() update_memory
Mem0 client.search() search_memory
Mem0 client.delete() delete_memory
CrewAI memory.remember() update_memory
CrewAI memory.recall() search_memory
CrewAI memory.forget() delete_memory
CrewAI memory.reset() delete_memory (scope-based)
AutoGen memory.add() update_memory
AutoGen memory.query() search_memory
AutoGen memory.clear() delete_memory
Letta/MemGPT blocks.create() update_memory
Letta/MemGPT blocks.retrieve() search_memory
Letta/MemGPT blocks.delete() delete_memory

Provider Mapping: gen_ai.memory.scope

This section documents how each major framework implements memory scoping and how their concepts map to the gen_ai.memory.scope enum values.

Cross-Provider Scope Support Matrix

Scope Value Google ADK AWS Bedrock AgentCore Azure AI Foundry Mem0 CrewAI Letta/MemGPT
user user_id param (required on all APIs) actorId param (short-term) / namespace /actors/{id}/ (long-term) scope="user_123" or scope="{{$userId}}" (auto-resolved from auth) user_id param on add()/search() scope("/user/alice") or source="user:alice" ⚠️ human block label convention (per-agent, no cross-agent user concept)
conversation ⚠️ session_id exists but is ignored in search — memory is cross-conversation by design sessionId param on create_event (short-term memory only) ❌ Not a first-class concept — memory is explicitly cross-session run_id param ❌ No explicit session/conversation param ⚠️ Implicit via agent conversation history
agent ❌ No concept — all agents in same app_name share memory memoryId (one memory resource per agent) or namespace segment memory_store_name (one store per agent) agent_id param memory.scope("/agent/researcher") ✅ Agent ID — blocks + archival scoped to agent (primary isolation unit)
team ❌ No concept (could encode in app_name) ❌ Convention only (shared memoryId + broad namespace prefix) scope="team_alpha" — docs mention "a team, or another identifier" ❌ No built-in (filter composition across org_id) memory.slice(scopes=["/agent/a", "/agent/b"]) ✅ Shared memory block attached to N agents

Provider-Specific Details

Google ADK

Scoping is via two required parameters on every API call: app_name (application-level isolation) and user_id (user-level isolation). There is no agent-level or team-level scoping — all agents within the same app_name share memory for a given user_id.

# BaseMemoryService contract — scoping is always (app_name, user_id)
await memory.search_memory(app_name="myapp", user_id="user_123", query="preferences")

# session_id is available but NOT used for search scoping
# InMemoryService stores by session_id but searches across ALL sessions for a user
# VertexAiMemoryBankService discards session_id entirely

Instrumentation guidance: Set gen_ai.memory.scope to user for all ADK memory operations since user_id is the primary and only meaningful isolation boundary.

AWS Bedrock AgentCore

Uses a hierarchical namespace-path system for long-term memory and explicit actorId/sessionId for short-term memory (events).

# Short-term: explicit actor + session scoping
client.create_event(memoryId='mem-abc', actorId='User1', sessionId='conv-001', ...)

# Long-term: namespace-path based scoping
client.retrieve_memory_records(
    memoryId='mem-abc',
    namespace='/strategies/semantic1/actors/User1/',  # prefix-matched
    searchCriteria={'searchQuery': 'preferences', 'topK': 5}
)

# Broad retrieval (all actors)
client.retrieve_memory_records(memoryId='mem-abc', namespace='/', ...)

Instrumentation guidance: Map actorIduser scope, sessionIdconversation scope. The memoryId implicitly provides agent-level isolation. Team scope would require convention-based namespace paths.

Azure AI Foundry

Uses a single scope string parameter on every API call. The value is developer-defined — there is no predefined enum.

# User-scoped (manual)
client.memory_stores.search_memories(name="store", scope="user_123", ...)

# User-scoped (auto-resolved from auth token)
client.memory_stores.search_memories(name="store", scope="{{$userId}}", ...)

# Team-scoped (developer-defined)
client.memory_stores.search_memories(name="store", scope="team_alpha", ...)

# GDPR deletion by scope
client.memory_stores.delete_scope(name="store", scope="user_123")

Instrumentation guidance: The scope parameter value directly maps to gen_ai.memory.scope. When the value matches a well-known pattern (e.g., contains "user", matches {{$userId}}), use user. For team identifiers, use team. The memory_store_name provides agent-level isolation.

Mem0

Uses flat entity-ID tags as parameters on every API call. No hierarchy — isolation is filter-based.

# User-scoped
client.add(messages, user_id="user_123")
client.search(query, user_id="user_123")

# Agent-scoped
client.add(messages, agent_id="support_bot")
client.search(query, agent_id="support_bot")

# Run/conversation-scoped
client.add(messages, run_id="conv_456")

# Combined scoping
client.search(query, user_id="user_123", agent_id="support_bot")

Instrumentation guidance: Map the primary scoping parameter used in the API call: user_iduser, agent_idagent, run_idconversation. When multiple are present, use the narrowest applicable scope.

CrewAI

Uses hierarchical path-based scopes with an expressive tree model.

# Agent-scoped
scope = memory.scope("/agent/researcher")
scope.save(content="Found 3 relevant papers")
results = scope.search("papers")

# User-scoped with privacy
memory.save(content="User prefers dark mode", source="user:alice", private=True)

# Team-scoped via slice (cross-scope view)
team_view = memory.slice(scopes=["/agent/researcher", "/agent/writer"], read_only=True)

Instrumentation guidance: Parse the scope path to determine the scope value: /user/*user, /agent/*agent. For slices spanning multiple agents, use team.

Letta/MemGPT

Uses an agent-centric block model where each agent owns labeled memory blocks.

# Agent-scoped (default — blocks belong to the agent)
agent = client.create_agent(memory_blocks=[
    {"label": "persona", "value": "I am a helpful assistant"},
    {"label": "human", "value": "User preferences: dark mode"},
])

# Team-scoped (shared block across agents)
shared_block = client.create_block(label="team_context", value="Project goals...")
agent_a = client.create_agent(block_ids=[shared_block.id])
agent_b = client.create_agent(block_ids=[shared_block.id])

Instrumentation guidance: Default to agent scope since blocks are agent-owned. When a shared block is detected (attached to multiple agents), use team.

Summary

The user scope is the most universally supported (5/6 frameworks have explicit support). The agent scope is well-supported (5/6). The conversation scope is partial (3/6 — Google ADK and Azure explicitly design memory as cross-session). The team scope is emergent (3/6 have native support, others can approximate via workarounds).

All four enum values (user, conversation, agent, team) have concrete mappings in at least 3 frameworks, validating the proposed scope enum.

Rationale: Why Separate Memory Spans?

This section addresses reviewer feedback about whether memory operations should be separate spans or an expansion of the agent span with gen_ai.operation.name: memory.

Recommendation: Keep Separate Spans

Memory operations are modeled as separate client spans rather than expanding the agent span for the following reasons:

1. Follows Database Pattern

OTel database conventions define separate client spans per operation with db.operation.name distinguishing them (e.g., SELECT, INSERT). Memory operations follow this same established pattern with gen_ai.operation.name values like search_memory, update_memory, etc.

2. Agent Span is Orchestration

The agent span (invoke_agent) represents the orchestration layer. Memory operations are I/O operations that happen within an agent invocation, similar to how database calls happen within a service request.

3. Trace Hierarchy

Separate spans enable clear trace hierarchy:

invoke_agent (agent span)
├── search_memory (memory span - retrieves context)
├── chat (inference span - LLM call)
└── update_memory (memory span - stores result)

4. Enables Correlation

Separate spans allow:

  • Duration metrics per operation type
  • Error rates per operation
  • Performance analysis (which memory operation is slow?)

Contrast with "memory" as Operation Name

If we used gen_ai.operation.name: memory on the agent span:

  • We would lose granularity (cannot distinguish search vs update vs delete)
  • We would need sub-attributes like gen_ai.memory.operation anyway
  • It would be inconsistent with database conventions

Rationale: Memory vs Database Conventions

This section addresses why memory operations need dedicated gen_ai.memory.* attributes rather than reusing db.* conventions.

Comparison Table

Aspect Database (db.*) GenAI Memory (gen_ai.memory.*) Unique to Memory?
System db.system.name (postgresql) gen_ai.provider.name (pinecone) No
Operation db.operation.name (SELECT) gen_ai.operation.name (search_memory) No
Target db.collection.name (users) gen_ai.memory.store.name No
Query db.query.text gen_ai.memory.query.text No
Scope N/A gen_ai.memory.scope (user, session, agent) YES
Importance N/A gen_ai.memory.importance YES
Agent Context N/A gen_ai.agent.id, gen_ai.conversation.id YES
Similarity Search N/A gen_ai.memory.search.similarity.threshold YES

Why Not Just Use db.* Attributes?

  1. Semantic vs Physical Scope: Memory uses semantic isolation (user, session, agent) vs database physical isolation (schema, namespace).

  2. AI Context: Memory carries gen_ai.agent.id, gen_ai.conversation.id - meaningless for databases.

  3. Importance Scoring: Memory items have gen_ai.memory.importance (0.0-1.0) affecting retrieval and retention.

  4. Similarity-Based Retrieval: gen_ai.memory.search.similarity.threshold is fundamental to vector-based memory retrieval.

  5. Memory is an Abstraction: Memory providers vary widely - some use vector databases (Pinecone, Chroma), others use in-memory stores, key-value caches, or custom backends. Not all memory providers use a database at all.

Hybrid Approach

Instrumentations:

  1. SHOULD emit gen_ai.memory.* attributes for AI-specific observability
  2. MAY additionally emit db.* attributes when the underlying storage is a database, for infrastructure-level correlation
  3. Memory spans carry GenAI-specific semantic meaning that db.* alone cannot express

Rationale: Memory vs Retrieval Operations

This section clarifies the distinction between search_memory and the existing retrieval operation in GenAI semantic conventions.

Comparison

Aspect Retrieval (gen_ai.retrieval.*) Memory (gen_ai.memory.*)
Purpose Fetch grounding context from external sources Manage persistent agent state
Data Source External documents, knowledge bases Agent-owned context
Lifecycle Read-only (fetch) Full CRUD (create, read, update, delete)
Scope Global knowledge User/session/agent-specific
Persistence External system manages Agent manages lifecycle
Example RAG from documentation Remember user preferences

Key Differences

1. Retrieval is Read-Only, Memory is CRUD

# Retrieval: Only fetches
retrieval → documents

# Memory: Full lifecycle
create_memory_store → store created
search_memory → results (like retrieval)
update_memory → item stored
delete_memory → item removed
delete_memory_store → store removed

2. Retrieval is External, Memory is Agent-Owned

  • Retrieval: "What does the documentation say about X?"

    • Source: External knowledge base
    • Agent does NOT modify the source
  • Memory: "What did this user tell me before?"

    • Source: Agent-owned persistent state
    • Agent creates, updates, and deletes

3. Retrieval is Stateless, Memory is Stateful

  • Retrieval: Same query → same results (assuming static docs)
  • Memory: Results change based on prior agent interactions

When to Use Which

Scenario Operation
Search product documentation retrieval
Query external API for facts retrieval
RAG from knowledge base retrieval
Find user past preferences search_memory
Recall conversation context search_memory
Multi-agent shared state search_memory (team scope)

Overlap: search_memory ≈ retrieval

search_memory IS similar to retrieval:

  • Both query for relevant context
  • Both return results with scores
  • Both use similarity thresholds

BUT search_memory operates on agent-managed memory, not external knowledge.

Cross-Framework Guidance

Not all frameworks distinguish memory retrieval from general retrieval at the API level. The correct operation depends on the framework's abstraction:

Use search_memory when the framework has explicit memory CRUD operations:

  • Mem0: client.search() / client.add() / client.delete()
  • LangMem (LangChain): create_search_memory_tool() / create_manage_memory_tool()
  • CrewAI: memory.recall() / memory.remember() / memory.forget()
  • AutoGen: memory.query() / memory.add() / memory.clear()
  • Letta/MemGPT: blocks.retrieve() / blocks.create() / blocks.delete()

Use retrieval when the framework uses a generic retrieval interface that does not distinguish memory from external knowledge (e.g., LangChain's on_retriever_start callback). In this case, supplemental gen_ai.memory.* attributes (such as added to the retrieval span when metadata indicates the retriever is backed by agent memory.

Rationale: Why GenAI-Specific Namespace?

This section explains why memory operations belong in the gen_ai.* namespace rather than using generic db.* conventions.

1. Memory is About AI Context, Not Data Storage

Database Operation Memory Operation
Store customer record Remember user preference
Query orders table Recall relevant context
Update inventory count Learn from interaction
Delete old logs Forget outdated information

Memory operations have semantic intent (remember, recall, learn, forget) that databases do not capture.

2. Memory Operations Are AI-Native

Memory spans carry AI context (conversation, agent, similarity) that is meaningless for databases:

  • gen_ai.agent.id - Which agent accessed memory
  • gen_ai.conversation.id - Links to conversation flow
  • gen_ai.memory.importance - Semantic importance score
  • gen_ai.memory.search.similarity.threshold - Vector similarity cutoff

3. Memory Crosses Multiple Storage Systems

A single "memory" might involve:

  • Vector database (Pinecone) for semantic search
  • Key-value store (Redis) for session state
  • Document database (MongoDB) for user profiles

Memory is an abstraction over storage, not a storage system itself.

4. Memory Has Lifecycle Semantics

  • Expiration: Memory items expire based on semantic rules (24h session, 30d preference)
  • Importance: Items have importance scores affecting retention
  • Scope propagation: Deleting user scope cascades to all related items

These are AI-specific lifecycle concerns not present in database conventions.

5. Different Consumers

  • Database metrics: Infrastructure teams, DBAs
  • Memory metrics: AI engineers, ML ops

AI engineers need memory-specific dashboards, not generic database monitoring. Memory operations must correlate with gen_ai.* spans (chat, invoke_agent), not with generic service requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment