Skip to content

Instantly share code, notes, and snippets.

@andrewh
Last active February 23, 2026 08:50
Show Gist options
  • Select an option

  • Save andrewh/eb89949923ae67f4df608c04e7c7e067 to your computer and use it in GitHub Desktop.

Select an option

Save andrewh/eb89949923ae67f4df608c04e7c7e067 to your computer and use it in GitHub Desktop.
motel how-to guides, preview demo, and traffic charts

motel: Preview Traffic

2026-02-22T09:00:00Z

motel preview renders the effective traffic rate over time as an SVG chart. This is useful for verifying bursty patterns, scenario overrides, and ramp-up shapes before sending traffic to a collector.

Uniform traffic

A flat rate produces a horizontal line across the chart.

motel preview --duration 5m docs/examples/traffic-patterns.yaml -o /tmp/uniform.svg
file /tmp/uniform.svg | grep -c 'SVG'
1

Uniform traffic at 50/s — flat horizontal line

Diurnal traffic

A sine wave oscillating between trough (0.5x) and peak (1.5x) over a 24-hour period. The preview shows the full cycle.

cat > /tmp/diurnal.yaml << 'EOF'
version: 1
services:
  api:
    operations:
      request:
        duration: 10ms +/- 3ms
        calls:
          - database.query
  database:
    operations:
      query:
        duration: 5ms +/- 2ms
traffic:
  rate: 50/s
  pattern: diurnal
EOF
motel preview --duration 24h /tmp/diurnal.yaml -o /tmp/diurnal.svg
file /tmp/diurnal.svg | grep -c 'SVG'
1

Diurnal traffic — sine wave over 24 hours

Bursty traffic

Alternates between a base rate and periodic high-rate bursts. This example bursts to 5x every minute for 10 seconds.

cat > /tmp/bursty.yaml << 'EOF'
version: 1
services:
  api:
    operations:
      request:
        duration: 10ms +/- 3ms
        calls:
          - database.query
  database:
    operations:
      query:
        duration: 5ms +/- 2ms
traffic:
  rate: 50/s
  pattern: bursty
  burst_interval: 1m
  burst_duration: 10s
EOF
motel preview --duration 5m /tmp/bursty.yaml -o /tmp/bursty.svg
file /tmp/bursty.svg | grep -c 'SVG'
1

Bursty traffic — periodic spikes to 250/s

Bursty traffic with scenario overrides

The stress-test topology combines a bursty base pattern (500/s with 10x bursts) with two scenario overrides: a sustained peak at 5,000/s and an extreme burst at 1,000/s with 15x multiplier. Scenario windows appear as shaded rectangles.

motel preview --duration 3m docs/examples/stress-test.yaml -o /tmp/stress-test.svg
grep 'class="scenario-rect"' /tmp/stress-test.svg | wc -l | tr -d ' '
2

Stress-test topology — bursty base with scenario overrides

Inferred duration

Without --duration, motel infers a preview window from the topology's scenarios — the latest scenario end time plus a 10% buffer. Without scenarios, it defaults to 5 minutes.

motel preview docs/examples/stress-test.yaml -o /tmp/inferred.svg
head -1 /tmp/inferred.svg | grep -c '<svg'
1

Output options

Write to a file with -o, or pipe stdout to a file or viewer.

motel preview --duration 1m docs/examples/traffic-patterns.yaml | head -1 | grep -c '<svg'
1

The SVG is self-contained with inline styles and no external fonts. It renders on GitHub, in browsers, and in most image viewers.

Model Your Services as a Topology

This guide covers two approaches to creating a topology that matches your real system: writing one by hand, or importing from existing trace data. Choose whichever fits your situation — or combine both.

Option A: Write a topology by hand

Start with what you know about your system's call graph: which services exist, what operations they expose, and who calls whom.

1. Sketch the call graph

Map your services and their dependencies. For example, a typical web application:

web-gateway
  └─ GET /products → product-service.list
                       └─ database.query
  └─ POST /orders → order-service.create
                       ├─ database.insert
                       └─ payment-service.charge

Each arrow is a call. Each box is a service with one or more operations.

2. Translate to YAML

Turn the sketch into a topology file. Start minimal — you can add detail later:

version: 1

services:
  web-gateway:
    operations:
      GET /products:
        duration: 50ms +/- 15ms
        calls:
          - product-service.list

      POST /orders:
        duration: 80ms +/- 20ms
        error_rate: 2%
        calls:
          - order-service.create

  product-service:
    operations:
      list:
        duration: 20ms +/- 5ms
        calls:
          - database.query

  order-service:
    operations:
      create:
        duration: 40ms +/- 10ms
        error_rate: 1%
        calls:
          - database.insert
          - payment-service.charge

  database:
    operations:
      query:
        duration: 5ms +/- 2ms
      insert:
        duration: 8ms +/- 3ms
        error_rate: 0.1%

  payment-service:
    operations:
      charge:
        duration: 200ms +/- 50ms
        error_rate: 3%

traffic:
  rate: 50/s

3. Validate and iterate

motel validate my-topology.yaml

Generate a short burst and inspect the output:

motel run --stdout --duration 2s my-topology.yaml |
  jq -r .Name | sort | uniq -c | sort -rn

  48 query
  48 list
  48 GET /products
  46 POST /orders
  46 insert
  46 create
  46 charge

Check that the operation names and relative counts look right. Adjust durations and error rates until the traces look realistic.

Tips for hand-written topologies

  • Start small. Get two or three services working, then add more. Validate after each change.
  • Use real latency numbers. Check your existing dashboards for p50 and standard deviation. 30ms +/- 10ms is better than guessing 30ms.
  • Error rates are per-span. A 1% error rate means roughly 1 in 100 spans will be marked as errors. Child errors cascade upward, so effective error rates are higher at the root.
  • Calls are parallel by default. If your service makes sequential downstream calls, set call_style: sequential on the operation.

Option B: Import from existing traces

If you already have trace data — from a staging environment, production sampling, or a test run — you can infer a topology automatically.

1. Collect trace data

You need spans in one of two formats:

  • stdouttrace — one JSON span per line, as produced by motel run --stdout or the OpenTelemetry Go SDK's stdout exporter
  • OTLP JSON — the standard OTLP export format with resourceSpans arrays

Export spans from your collector, or capture them directly:

# Generate sample data
motel run --stdout --duration 30s topology.yaml > traces.jsonl

# Or use traces from another source
cat exported-traces.json

More traces produce better statistical accuracy. The import command warns if you have fewer traces than --min-traces (default: 1).

2. Run import

# From a file
motel import traces.jsonl

# From stdin (e.g. piped from another tool)
cat traces.jsonl | motel import

The output is a YAML topology written to stdout. Redirect it to a file:

motel import traces.jsonl > inferred-topology.yaml

The generated YAML includes a comment header noting how many traces and spans were analysed.

3. Review and adjust

The inferred topology is a starting point, not a finished product. Review it for:

  • Duration distributions — import calculates mean and standard deviation from the observed spans. Check that these match your expectations.
  • Error rates — derived from the proportion of error spans. Small sample sizes produce noisy estimates.
  • Call style — import votes on parallel vs sequential based on child span timing overlap. Verify this matches your service's actual behaviour.
  • Missing services — import can only infer what it sees. If some call paths are rare, they may not appear in a small sample.

Validate the inferred topology and generate traces from it:

motel validate inferred-topology.yaml
motel run --stdout --duration 5s inferred-topology.yaml | head -20

Combining both approaches

A practical workflow is to import a rough topology from traces, then refine it by hand:

  1. Collect traces from your real system
  2. Run motel import to get a baseline topology
  3. Edit the YAML to fix any inaccuracies, add missing services, or adjust distributions
  4. Add scenarios for failure modes you want to simulate (latency spikes, error injection)
  5. Validate and iterate

Further reading

Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Stress-Test Collector Queue and Retry Configuration

This guide shows how to use motel to push a collector to its limits — finding queue overflow thresholds, verifying retry behaviour, and measuring data loss under load.

Prerequisites

  • A running OpenTelemetry Collector with persistent queue and retry settings you want to test
  • A backend (Jaeger, Tempo, or similar) to receive traces from the collector
  • motel installed

1. Start with a baseline

Before stressing anything, establish a baseline at a comfortable rate. This confirms your pipeline works end-to-end:

motel run --endpoint http://localhost:4318 --protocol http/protobuf \
  --duration 30s docs/examples/stress-test.yaml

Check that traces flow through the collector to your backend. If they do not arrive at low volume, fix the pipeline before increasing load.

2. Use bursty traffic to test queue overflow

Bursty traffic sends a steady rate most of the time, then spikes periodically. This is the pattern that most commonly triggers queue overflow — sustained high throughput is easier to size for than sudden spikes.

The example topology at docs/examples/stress-test.yaml uses these traffic settings:

traffic:
  rate: 500/s
  pattern: bursty
  burst_multiplier: 10
  burst_interval: 30s
  burst_duration: 5s

This sends 500 traces/s normally, spiking to 5,000 traces/s for 5 seconds every 30 seconds. Each trace produces multiple spans (one per operation in the call graph), so the actual span rate is several times higher.

To visualise the traffic shape before sending it, use motel preview:

motel preview --duration 3m docs/examples/stress-test.yaml -o traffic.svg

Traffic rate over time for stress-test.yaml

Run it for long enough to see multiple burst cycles:

motel run --endpoint http://localhost:4318 --protocol http/protobuf \
  --duration 5m docs/examples/stress-test.yaml

Watch the collector's own metrics during the run. Key signals:

  • otelcol_exporter_queue_size — current queue depth
  • otelcol_exporter_queue_capacity — configured maximum
  • otelcol_exporter_enqueue_failed_spans — spans dropped because the queue was full

If the queue never fills, increase burst_multiplier or burst_duration.

3. Ramp up to find the throughput ceiling

Use scenarios to progressively increase the rate and find where the collector starts dropping data:

scenarios:
  - name: sustained peak
    at: +1m
    duration: 30s
    traffic:
      rate: 5000/s
      pattern: uniform

Alternatively, use custom traffic with segments for a manual ramp:

traffic:
  rate: 100/s
  pattern: custom
  segments:
    - rate: 500/s
      until: 1m
    - rate: 2000/s
      until: 2m
    - rate: 5000/s
      until: 3m
    - rate: 10000/s
      until: 4m

Run it and note the rate at which otelcol_exporter_enqueue_failed_spans begins climbing. That is your collector's effective ceiling for this configuration.

4. Verify retry behaviour

To test retries specifically, make the backend intermittently unavailable while motel is running. For example, if your backend is behind a proxy, briefly block traffic:

# In one terminal — generate load
motel run --endpoint http://localhost:4318 --protocol http/protobuf \
  --duration 5m docs/examples/stress-test.yaml

# In another terminal — simulate backend outage after 30 seconds
sleep 30 && docker stop tempo && sleep 15 && docker start tempo

During the outage, the collector should queue spans and retry. After the backend returns, check:

  • otelcol_exporter_send_failed_spans — spans that failed export (should spike during outage)
  • otelcol_receiver_refused_spans — spans the collector could not accept
  • Queue size should drain back to zero after the outage ends

If the queue fills and spans are dropped during a 15-second outage, your queue size or retry settings need tuning.

5. Measure data loss

Compare what motel sent against what your backend received. Motel logs the total number of spans it generates:

motel run --endpoint http://localhost:4318 --protocol http/protobuf \
  --duration 2m docs/examples/stress-test.yaml 2>&1 | grep spans

Then query your backend for the same time window and count the spans received. The difference is your data loss under that load profile.

For precise counting, send traces to both the collector and stdout simultaneously using two collector receivers, then compare line counts:

motel run --stdout --duration 2m docs/examples/stress-test.yaml | wc -l

This gives you the exact number of spans motel produced. Compare against your backend's span count for the same period.

Tips

  • Start low, increase gradually. It is easier to find the breaking point by ramping up than by guessing a high number.
  • Monitor the collector, not just the backend. Queue depth and enqueue failures tell you what is happening before data loss shows up downstream.
  • Test with realistic topologies. A single flat service produces uniform spans. Real systems have variable fan-out and latency, which affects batching efficiency. Use a topology that matches your production call graph.
  • Run long enough. Short runs may not trigger queue overflow if the burst fits within available headroom. Run for at least 5 minutes with multiple burst cycles.
  • Check both protocols. gRPC and HTTP/protobuf have different performance characteristics. Test whichever your production agents use:
    motel run --endpoint http://localhost:4317 --protocol grpc ...
    motel run --endpoint http://localhost:4318 --protocol http/protobuf ...

Further reading

Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Test Alert Thresholds

This guide shows how to use motel to verify that your alerting rules fire when they should. You will set up a baseline topology, inject a degradation with a scenario, and confirm the alert triggers within the expected window.

Set up a baseline topology

Start with a topology that represents your service under normal conditions. The key parameters are a realistic error rate, typical latency, and enough traffic to produce a statistically meaningful signal.

version: 1

services:
  web-gateway:
    operations:
      GET /api:
        duration: 45ms +/- 12ms
        error_rate: 0.1%
        calls:
          - backend.handle-request

  backend:
    operations:
      handle-request:
        duration: 30ms +/- 8ms
        error_rate: 0.05%

traffic:
  rate: 100/s

A rate of 100/s is a reasonable starting point. At lower rates (say 1-5/s), per-second error rate calculations become noisy and alerts may flicker. Higher rates produce smoother metrics but more data. Match the rate to what your real service handles, or at least keep it high enough that a 5-minute evaluation window sees thousands of spans.

Validate and run a short burst to check the output looks right:

motel validate alert-test.yaml
motel run --stdout --duration 5s alert-test.yaml | head -20

Inject a degradation with a scenario

Scenarios overlay time-windowed overrides onto the baseline. To test an error rate alert, inject elevated errors at a known offset:

scenarios:
  - name: error spike
    at: +2m
    duration: 8m
    override:
      web-gateway.GET /api:
        error_rate: 10%

This keeps the first two minutes at baseline (0.1% errors), then raises the error rate to 10% for eight minutes.

You can test latency alerts the same way:

scenarios:
  - name: latency spike
    at: +2m
    duration: 8m
    override:
      backend.handle-request:
        duration: 500ms +/- 100ms

Add the scenario block to your topology file under the scenarios: key.

Match scenario duration to alerting windows

An alert that evaluates over a 5-minute window needs the degraded condition to persist for at least 5 minutes. If your scenario is shorter than the evaluation window, the alert may never fire.

Consider the full timeline:

  1. Startup settling -- the first few seconds of motel output may not represent steady state as spans complete at different times. Allow a buffer before the scenario begins.
  2. Scenario onset -- the degradation starts at the at offset.
  3. Evaluation window fills -- the alerting backend needs a full window of degraded data before it can trigger.
  4. Evaluation frequency -- if the alert checks every 60 seconds with a 5-minute window, the condition must hold across multiple evaluation cycles.

A safe rule of thumb: set the scenario duration to at least 1.5 times the evaluation window. For a 5-minute window, use 8 minutes of degraded traffic.

Account for sampling

Alerts fire on metrics, but those metrics can come from different points in the pipeline — before sampling, after sampling, or from application-level instrumentation. The point at which metrics are derived determines what error rate motel needs to produce.

Metrics from sampled traces

If your pipeline derives RED metrics (rate, errors, duration) from traces after a tail sampler, the sampler distorts what the metrics see. A tail sampler that keeps all error traces will overrepresent errors in the derived metrics. A 1% true error rate might appear as 10% or higher, depending on the sampling policy. To test a specific threshold, you need to work backwards from the sampler's behaviour to determine what error rate motel should produce.

Span metrics connector (pre-sampling)

If you use the OpenTelemetry Collector's span metrics connector before the sampling stage, metrics reflect the true rate. Motel's configured error_rate maps directly to what your alerts observe. This is the simplest case.

Application-level metrics

Motel can generate metrics directly, bypassing trace sampling entirely:

motel run --signals metrics --duration 15m --endpoint http://localhost:4318 alert-test.yaml

This is the most predictable path for alert testing. The configured error rate is exactly what your alerting backend sees, with no sampling distortion.

Recommendation: understand where in your pipeline metrics are derived. If metrics come from sampled traces, calibrate motel's error rates to account for the sampler. If metrics come from a pre-sampling connector or directly from motel, use the target error rate directly.

Worked example

Suppose you have an alert rule: "fire when the error rate for web-gateway GET /api exceeds 5% for 5 minutes."

1. Write the topology

version: 1

services:
  web-gateway:
    operations:
      GET /api:
        duration: 45ms +/- 12ms
        error_rate: 0.1%
        calls:
          - backend.handle-request

  backend:
    operations:
      handle-request:
        duration: 30ms +/- 8ms
        error_rate: 0.05%

traffic:
  rate: 100/s

scenarios:
  - name: error spike
    at: +2m
    duration: 8m
    override:
      web-gateway.GET /api:
        error_rate: 10%

The baseline error rate (0.1%) is well below the 5% threshold. The scenario injects 10% errors -- comfortably above the threshold to avoid borderline cases.

2. Run motel

Send telemetry to your collector for the full duration of baseline plus scenario:

motel run --duration 12m --endpoint http://localhost:4318 alert-test.yaml

Use --label-scenarios if you want scenario names attached to spans for debugging:

motel run --duration 12m --label-scenarios --endpoint http://localhost:4318 alert-test.yaml

3. Predict when the alert fires

  • T+0m to T+2m: baseline at 0.1% errors. No alert.
  • T+2m: scenario begins, error rate rises to 10%.
  • T+2m to T+7m: the 5-minute evaluation window fills with degraded data.
  • Around T+7m: the alert should fire, depending on evaluation frequency and any pending-period configuration.

If your alert has a "for" / pending duration of 2 minutes, add that to the expected time: the alert fires around T+9m.

4. Verify

Check your alerting backend (Prometheus Alertmanager, Grafana, PagerDuty, or wherever alerts route) to confirm the alert fired within the expected window. If it did not:

  • Check that metrics are arriving in your backend (--signals traces,metrics or confirm your span metrics connector is running).
  • Verify the alert rule's label matchers correspond to the service and operation names motel produces.
  • Review the sampling section above if derived error rates do not match expectations.

Further reading

Test Backend Integrations

This guide covers using motel to verify that an OTLP-compatible backend accepts, stores, and displays traces correctly. The same approach works for initial setup, multi-backend routing, and backend migrations.

Prerequisites

  • motel installed
  • A topology file (see Model your services to create one)
  • One or more OTLP-compatible backends running and reachable

Verify a new backend

Send a short burst of traces to confirm the backend accepts OTLP data and renders it correctly.

1. Create a test topology

Use a small topology that exercises the features you care about — multiple services, varying durations, and some errors:

version: 1

services:
  web-gateway:
    attributes:
      deployment.environment: staging
      service.version: 1.0.0
    operations:
      GET /healthz:
        duration: 5ms +/- 2ms
        calls:
          - api-server.healthcheck

  api-server:
    attributes:
      deployment.environment: staging
      service.version: 2.3.1
    operations:
      healthcheck:
        duration: 2ms +/- 1ms
      process-order:
        duration: 80ms +/- 20ms
        error_rate: 5%
        calls:
          - database.query

  database:
    attributes:
      deployment.environment: staging
      db.system: postgresql
    operations:
      query:
        duration: 10ms +/- 3ms
        error_rate: 1%

traffic:
  rate: 10/s

Save this as backend-test.yaml.

2. Send traces to the backend

Point motel at your backend's OTLP endpoint:

motel run --endpoint http://localhost:4318 --protocol http/protobuf \
  --duration 10s backend-test.yaml

For gRPC endpoints:

motel run --endpoint localhost:4317 --protocol grpc \
  --duration 10s backend-test.yaml

3. Check the results

Open your backend's UI and verify:

  • All three services appear (web-gateway, api-server, database)
  • Traces show the expected call hierarchy: web-gateway calls api-server, which calls database
  • Span durations are in the expected ranges
  • Some spans on api-server.process-order and database.query are marked as errors
  • Resource attributes (deployment.environment, service.version, db.system) are visible

If traces do not appear, check that the endpoint URL and protocol match your backend's configuration. Use --stdout to confirm motel is generating valid data:

motel run --stdout --duration 2s backend-test.yaml | head -5

Test multi-backend routing

A common pattern is routing traces to different backends based on content — for example, sending error traces to a dedicated backend for alerting. motel does not route traces itself, but it generates consistent traffic that you can use to verify your collector's routing rules.

1. Configure your collector

Set up an OpenTelemetry Collector with routing logic. For example, a collector config that sends error spans to one backend and everything to another:

exporters:
  otlphttp/primary:
    endpoint: http://primary-backend:4318
  otlphttp/errors:
    endpoint: http://errors-backend:4318

processors:
  filter/errors:
    error_mode: ignore
    traces:
      span:
        - 'status.code == STATUS_CODE_ERROR'

service:
  pipelines:
    traces/all:
      receivers: [otlp]
      exporters: [otlphttp/primary]
    traces/errors:
      receivers: [otlp]
      processors: [filter/errors]
      exporters: [otlphttp/errors]

2. Send traffic through the collector

Point motel at the collector's intake:

motel run --endpoint http://localhost:4318 --protocol http/protobuf \
  --duration 30s backend-test.yaml

Use a topology with a meaningful error rate (the example above uses 5% on process-order) so that both backends receive data.

3. Verify routing

  • Primary backend: should contain all traces
  • Errors backend: should contain only traces with error spans

Check that error spans in the errors backend carry the same trace IDs, attributes, and timing as their counterparts in the primary backend.

Verify attribute handling

Different backends handle attributes differently — some index specific keys, some have length limits, some drop unknown attribute types. Use motel to send traces with a range of attribute shapes and verify they survive the round trip.

1. Add varied attributes to your topology

services:
  attribute-test:
    attributes:
      deployment.environment: production
      service.version: 3.1.4
      cloud.provider: aws
      cloud.region: eu-west-1
    operations:
      varied-attributes:
        duration: 20ms +/- 5ms
        attributes:
          http.method:
            value: GET
          http.status_code:
            range: [200, 599]
          http.route:
            values: {"/api/users": 50, "/api/orders": 30, "/api/products": 20}
          request.id:
            sequence: "req-{n}"

traffic:
  rate: 5/s

2. Send and inspect

motel run --endpoint http://localhost:4318 --protocol http/protobuf \
  --duration 10s attribute-test.yaml

In your backend, verify:

  • Resource attributes appear at the service level (deployment.environment, cloud.provider)
  • Span attributes appear on individual spans (http.method, http.status_code)
  • Numeric ranges produce varied integer values, not strings
  • Weighted values produce the expected distribution (roughly 50/30/20 across routes)
  • Sequence values increment correctly (req-1, req-2, ...)

Smoke test a backend migration

When migrating from one backend to another, use motel to send identical traffic to both and compare the results side by side.

1. Send the same traffic to both backends

Run motel twice with the same topology and duration. Use --stdout to capture the data, then replay it to each backend:

motel run --stdout --duration 30s backend-test.yaml > traces.jsonl

Send to the old backend:

motel run --endpoint http://old-backend:4318 --protocol http/protobuf \
  --duration 30s backend-test.yaml

Send to the new backend:

motel run --endpoint http://new-backend:4318 --protocol http/protobuf \
  --duration 30s backend-test.yaml

2. Compare

Check both backends for:

  • Same number of services and operations visible
  • Consistent attribute rendering (no truncation, no missing keys)
  • Similar trace visualisation (waterfall view, service maps)
  • Error spans displayed and filterable

The trace IDs will differ between runs, but the structure, timing distributions, and attribute values should be consistent. What matters is that both backends handle the same shape of data identically.

3. Use scenarios to stress-test

Add a scenario to simulate a latency spike and verify both backends handle it:

scenarios:
  - name: database latency spike
    at: +5s
    duration: 10s
    override:
      database.query:
        duration: 500ms +/- 100ms
        error_rate: 15%

Run with the scenario and confirm both backends display the spike correctly in their dashboards and alerting views.

Further reading

Test OTTL Transformations

This guide shows how to use motel to test OpenTelemetry Transformation Language (OTTL) rules with fast, repeatable feedback. You will design a topology that produces spans with attributes worth transforming, run those spans through a collector with OTTL processors, and verify the output.

Prerequisites

  • motel installed
  • An OpenTelemetry Collector binary — download otelcol-contrib from the collector releases, which includes the transform processor

1. Design a topology with messy attributes

Real telemetry is messy. Attributes use inconsistent naming conventions, carry values that belong at a different level, or contain compound strings that should be split. Design a topology that reproduces these problems so you have concrete data to transform.

The example topology at docs/examples/ottl-transforms.yaml generates spans with several common issues:

  • Mixed naming conventionshttpStatusCode (camelCase) alongside http.request.method (dotted)
  • Compound valuesrequest.metadata contains "region=eu-west-1;priority=high" that should be separate attributes
  • Wrong attribute leveldatacenter appears on spans but belongs on the service resource
  • Inconsistent conventionsnotification_type (underscores) vs deployment.environment (dots)
  • PII leakageuser.email that should be redacted

2. Capture baseline output

Generate spans to stdout to see what the raw attributes look like before any transformation:

motel run --stdout --duration 5s docs/examples/ottl-transforms.yaml | head -20

Pick a few representative spans and note the attributes you want to change. For example, you might see:

{
  "Name": "POST /api/checkout",
  "Attributes": {
    "httpStatusCode": "200",
    "request.metadata": "region=eu-west-1;priority=high",
    "customer.id": "cust-42"
  }
}

3. Write a collector config with OTTL rules

Create a collector configuration that receives OTLP, applies transforms, and exports to stdout so you can inspect the results. Save this as collector-ottl.yaml:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  transform:
    trace_statements:
      - context: span
        statements:
          # Rename camelCase attribute to dotted convention
          - set(span.attributes["http.response.status_code"], span.attributes["httpStatusCode"])
            where span.attributes["httpStatusCode"] != nil
          - delete_key(span.attributes, "httpStatusCode")
            where span.attributes["http.response.status_code"] != nil

          # Rename underscore attributes to dotted convention
          - set(span.attributes["notification.type"], span.attributes["notification_type"])
            where span.attributes["notification_type"] != nil
          - delete_key(span.attributes, "notification_type")
            where span.attributes["notification.type"] != nil
          - set(span.attributes["notification.channel"], span.attributes["notification_channel"])
            where span.attributes["notification_channel"] != nil
          - delete_key(span.attributes, "notification_channel")
            where span.attributes["notification.channel"] != nil

          # Redact PII
          - replace_pattern(span.attributes["user.email"], "^.*$", "REDACTED")
            where span.attributes["user.email"] != nil

exporters:
  debug:
    verbosity: detailed

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [transform]
      exporters: [debug]

4. Run motel through the collector

Start the collector, then send motel's output to it:

# Terminal 1: start the collector
otelcol-contrib --config collector-ottl.yaml

# Terminal 2: send traces
motel run --endpoint localhost:4317 --protocol grpc \
  --duration 5s docs/examples/ottl-transforms.yaml

The collector's debug exporter prints transformed spans to its stderr. Check that:

  • httpStatusCode is now http.response.status_code
  • notification_type is now notification.type
  • user.email shows REDACTED

5. Compare before and after

For a structured comparison, export both the raw and transformed output. Use the HTTP protocol with a file exporter in the collector config, or compare the motel stdout output against the collector debug output:

# Raw output (before transforms)
motel run --stdout --duration 5s docs/examples/ottl-transforms.yaml \
  | jq -r '.Attributes[].Key' | sort -u > attrs-before.txt

# Send through collector, capture its debug output
otelcol-contrib --config collector-ottl.yaml 2> collector-output.txt &
COLLECTOR_PID=$!
motel run --endpoint localhost:4317 --protocol grpc \
  --duration 5s docs/examples/ottl-transforms.yaml
kill $COLLECTOR_PID

# Check that renamed attributes appear and originals are gone
grep "http.response.status_code" collector-output.txt
grep "httpStatusCode" collector-output.txt  # should find nothing

6. Iterate on your rules

The feedback loop is fast because motel generates deterministic, controllable traffic:

  1. Edit the transform processor statements in your collector config
  2. Restart the collector
  3. Run motel again for a short burst (--duration 2s is usually enough)
  4. Inspect the output

Tips for iterating

  • Start with one rule at a time. Add a single statement, verify it works, then add the next. OTTL errors are easier to diagnose in isolation.
  • Use --duration 2s and a low traffic rate (the example uses 20/s) so you get enough spans to verify without drowning in output.
  • Use weighted values to test conditional logic. The example topology generates httpStatusCode with weighted values including "500" — you can write OTTL rules that behave differently for error status codes.
  • Test edge cases with attribute generators. Use sequence to produce predictable IDs, range for numeric boundaries, and values with weights to control the distribution of attribute values your OTTL rules will encounter.

Further reading

Test Tail Sampling Policies

This guide shows how to use motel to generate traces that exercise tail sampling policies in an OpenTelemetry Collector, so you can verify your sampling rules before deploying them against production traffic.

What you need

  • motel installed
  • An OpenTelemetry Collector binary — download otelcol-contrib from the collector releases, which includes the tail sampling processor

1. Create a topology with varied trace characteristics

Tail sampling decisions depend on trace properties: duration, error status, attributes. To test policies effectively, your topology should produce a predictable mix of these characteristics.

The example topology at docs/examples/tail-sampling-test.yaml generates four categories of traces:

  • Normal traces (majority) -- fast, successful requests through a six-service call graph
  • Error traces -- payment failures and database errors at low but measurable rates
  • Slow traces -- scenario-driven latency spikes in database and payment services
  • VIP traces -- a customer.tier: vip attribute on 10-15% of requests, useful for attribute-based sampling

The topology also includes two scenarios that create time windows of degraded behaviour, giving you both steady-state and incident conditions to sample against.

2. Configure the collector with tail sampling

Create a collector configuration that receives OTLP traces from motel and applies tail sampling policies. Save this as collector-config.yaml:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  tail_sampling:
    decision_wait: 5s
    num_traces: 1000
    policies:
      # Keep all traces with errors
      - name: errors
        type: status_code
        status_code:
          status_codes:
            - ERROR

      # Keep traces slower than 500ms
      - name: slow-traces
        type: latency
        latency:
          threshold_ms: 500

      # Keep all VIP customer traces
      - name: vip-customers
        type: string_attribute
        string_attribute:
          key: customer.tier
          values:
            - vip

      # Sample 5% of remaining traces
      - name: baseline
        type: probabilistic
        probabilistic:
          sampling_percentage: 5

exporters:
  debug:
    verbosity: detailed

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [tail_sampling]
      exporters: [debug]

This configuration applies four policies in order. A trace is kept if any policy matches -- errors, slow traces, and VIP traces are always kept, and 5% of everything else is sampled.

3. Run motel against the collector

Start the collector:

otelcol-contrib --config collector-config.yaml

In a separate terminal, run motel against it:

motel run --endpoint localhost:4317 --protocol grpc \
  --duration 15s docs/examples/tail-sampling-test.yaml

The 15-second duration covers both scenarios in the topology (slow database at +3s and payment errors at +8s), so you will see traces that match the latency and error policies.

4. Verify what gets sampled

The debug exporter logs every trace that passes the sampling filter. Look at the collector output and check:

  • Error traces appear. Search for spans with status.code: Error. These should be present even at low error rates.
  • Slow traces appear. Look for traces with root span durations above 500ms. These should cluster around the scenario windows.
  • VIP traces appear. Search for customer.tier: vip. Roughly 10-15% of the original traffic should match.
  • Normal traces are sparse. Fast, successful, non-VIP traces should appear at roughly 5% of their original rate.

For a quick count, pipe motel's stdout output through jq to see the raw distribution before sampling:

motel run --stdout --duration 15s docs/examples/tail-sampling-test.yaml | \
  jq -r 'select(.Parent.SpanID == "0000000000000000") | .Status.Code' | \
  sort | uniq -c

Compare this against the collector's debug output to confirm the sampling ratios match your expectations.

5. Adjust the topology for edge cases

Once baseline policies work, modify the topology to test boundary conditions.

What if all traces are slow?

Override the root operation's duration to push every trace above the latency threshold:

scenarios:
  - name: everything slow
    at: +0s
    duration: 30s
    override:
      api-gateway.GET /search:
        duration: 1000ms +/- 200ms
      api-gateway.POST /checkout:
        duration: 1500ms +/- 300ms

With this override, the latency policy keeps 100% of traces. This tests whether your collector handles the load when tail sampling stops reducing volume.

What if error rates spike?

Raise the error rate across all services to simulate a widespread outage:

scenarios:
  - name: mass errors
    at: +0s
    duration: 30s
    override:
      api-gateway.GET /search:
        error_rate: 50%
      api-gateway.POST /checkout:
        error_rate: 50%
      payment-service.charge:
        error_rate: 80%

What if VIP traffic dominates?

Change the customer.tier attribute weights so most traffic is VIP:

attributes:
  customer.tier:
    values:
      standard: 10
      vip: 90

This verifies that your probabilistic baseline still applies when the attribute-based policy matches most traces.

Test with scenarios labelled

Use the --label-scenarios flag to add synth.scenarios attributes to spans, so you can see which scenario was active when a trace was generated:

motel run --stdout --duration 15s --label-scenarios \
  docs/examples/tail-sampling-test.yaml | \
  jq -r '(.Attributes[] | select(.Key == "synth.scenarios") | .Value.Value) as $v |
    if ($v | length) == 0 then "baseline" else ($v | join(",")) end' | \
  sort | uniq -c

Further reading

Understand Attribute Placement and Cardinality

This guide covers how motel models resource attributes and span attributes, how to experiment with moving attributes between levels, and how to use attribute generators to explore cardinality impact before deploying changes to production.

Prerequisites

  • motel installed
  • A topology file (see Model your services)
  • A tracing backend or the --stdout flag for local inspection

Resource attributes vs span attributes

motel distinguishes two levels of attributes, matching the OpenTelemetry data model:

  • Resource attributes are defined under services.<name>.attributes. They describe the service itself and are attached to every span the service produces. These are static string key-value pairs.
  • Span attributes are defined under services.<name>.operations.<op>.attributes. They describe individual operations and can vary per span using attribute generators.
services:
  gateway:
    attributes:                        # resource attributes
      deployment.environment: production
      service.namespace: demo
    operations:
      GET /users:
        duration: 30ms +/- 10ms
        attributes:                    # span attributes
          http.request.method:
            value: GET
          http.response.status_code:
            values: {"200": 95, "404": 3, "500": 2}

Resource attributes appear once per service resource in the exported telemetry. Span attributes appear on each individual span. This distinction matters for storage cost, query performance, and how your backend indexes data.

Experiment: move an attribute between levels

A practical way to understand the difference is to move an attribute from one level to the other and observe the result.

Start with a resource attribute

Create a file called placement-test.yaml:

version: 1

services:
  api:
    attributes:
      deployment.environment: staging
    operations:
      handle:
        duration: 10ms

traffic:
  rate: 10/s

Generate traces and inspect the output:

motel run --stdout --duration 3s placement-test.yaml | head -20

Notice that deployment.environment appears on every span from the api service with the same value. Because it is defined at the service level, it is automatically attached to all operations — you do not need to repeat it on each operation.

Move it to a span attribute

Now move deployment.environment from the service level to the operation level:

version: 1

services:
  api:
    operations:
      handle:
        duration: 10ms
        attributes:
          deployment.environment:
            value: staging

traffic:
  rate: 10/s

Run the same command:

motel run --stdout --duration 3s placement-test.yaml | head -20

The --stdout output looks similar in both cases — motel attaches all attributes to each span. The difference matters when the telemetry reaches a real backend. In the OpenTelemetry data model, service-level attributes belong on the Resource and are stored once per service, while span-level attributes are stored on every individual span. Placing a constant value at the span level increases storage cost and may change how you query the attribute.

To see this distinction in practice, send the two versions to a collector and compare how your backend indexes them:

motel run --endpoint localhost:4318 --duration 10s placement-test.yaml

Rule of thumb: attributes that are constant for a service belong at the service level. Attributes that vary per request belong at the operation level.

Attribute generators and cardinality

Span attributes in motel use generators that control how many distinct values an attribute produces. This directly maps to cardinality — the number of unique values a backend must index.

Low cardinality: value and values

A value generator always produces the same string — cardinality of 1:

http.request.method:
  value: GET

A values generator picks from a fixed set with weighted probability — cardinality equals the number of choices:

http.response.status_code:
  values:
    "200": 95
    "404": 3
    "500": 2

These are safe for most backends. The set of distinct values is small and bounded.

High cardinality: sequence

A sequence generator produces a unique value for every span:

user.id:
  sequence: "user-{n}"

This creates user-1, user-2, user-3, and so on — unbounded cardinality. Add this to a topology and send traffic to your backend to see how it handles high-cardinality attributes.

Numeric range: range

A range generator produces random integers within bounds:

http.response.content_length:
  range: [0, 50000]

Cardinality is bounded by the range size but can still be high. A range of [0, 50000] produces up to 50,001 distinct values.

Controlled distribution: distribution

A distribution generator samples from a normal distribution:

queue.depth:
  distribution:
    mean: 100
    stddev: 20

Values cluster around the mean but the theoretical range is unbounded.

Boolean: probability

A probability generator produces true/false with the given probability — cardinality of 2:

cache.hit:
  probability: 0.8

Test cardinality impact on your backend

Combine these generators in a topology to simulate realistic and adversarial attribute patterns:

version: 1

services:
  api:
    attributes:
      deployment.environment: staging
    operations:
      handle:
        duration: 15ms +/- 5ms
        attributes:
          http.request.method:
            values: {"GET": 70, "POST": 20, "PUT": 10}
          user.id:
            sequence: "user-{n}"
          http.response.status_code:
            values: {"200": 90, "400": 5, "500": 5}
          response.size:
            range: [100, 10000]

traffic:
  rate: 50/s

Send this to your backend and monitor:

motel run --endpoint localhost:4318 --duration 60s cardinality-test.yaml

Watch for:

  • Index growth — high-cardinality attributes like user.id cause index bloat in most tracing backends
  • Query performance — try querying by user.id vs http.request.method and compare response times
  • Storage cost — compare the data volume with and without the user.id attribute

To isolate the effect of a single attribute, run the topology twice — once with the high-cardinality attribute and once without — and compare the results in your backend.

Semantic conventions and correct placement

The --semconv flag helps ensure attributes are placed correctly according to OpenTelemetry semantic conventions. It validates that attribute names and placements match convention definitions.

motel validate --semconv /path/to/semconv topology.yaml

The --semconv flag points to a directory containing semantic convention YAML files. motel loads the embedded OpenTelemetry conventions by default and merges any additional conventions you provide.

This catches common mistakes:

  • Using a deprecated attribute name when a replacement exists
  • Placing an attribute at the wrong level (resource vs span)
  • Misspelling a well-known attribute name

You can also use --semconv with motel run to validate at generation time:

motel run --stdout --semconv /path/to/semconv --duration 5s topology.yaml

Further reading

Use motel with otel-cli

otel-cli is a command-line tool for creating OpenTelemetry spans from shell scripts. motel generates synthetic telemetry from topology definitions. The two tools complement each other — otel-cli instruments real commands, motel simulates entire distributed systems.

This guide covers practical ways to use them together.

Prerequisites

  • motel installed
  • otel-cli installed (releases)

Save this as topology.yaml to use with the examples below:

version: 1

services:
  web-gateway:
    operations:
      GET /products:
        duration: 50ms +/- 15ms
        calls:
          - product-service.list

  product-service:
    operations:
      list:
        duration: 20ms +/- 5ms
        calls:
          - database.query

  database:
    operations:
      query:
        duration: 5ms +/- 2ms

traffic:
  rate: 1/s

Use otel-cli's TUI as a trace viewer

otel-cli can run as a local OTLP server with a terminal UI that displays incoming traces. This gives you a zero-setup way to visually inspect what motel generates — no collector, no Jaeger, no Grafana needed.

In one terminal, start the TUI server:

otel-cli server tui

This listens for gRPC on localhost:4317. In another terminal, point motel at it:

motel run --endpoint localhost:4317 --protocol grpc --duration 10s topology.yaml

The TUI displays each span as it arrives. This is particularly useful when you are authoring a new topology and want to see whether the call graph, durations, and error rates look right before sending traffic to a real backend.

Tips for the TUI workflow

  • Keep the rate low. The TUI is meant for inspection, not throughput. A rate of 5/s or 10/s is plenty.
  • Use a short duration. A few seconds of traffic gives you enough spans to check the shape without flooding the display.
  • Watch for errors. Spans with error status stand out in the TUI, making it easy to verify that your error rates and cascading failures behave as expected.

Mix real and synthetic traces

You can use otel-cli to instrument a real shell command alongside motel's synthetic traffic, with otel-cli's TUI server as the receiver.

In the first terminal, start the TUI server:

otel-cli server tui

In the second terminal, start motel generating background traffic. Use a topology with a low traffic rate (e.g. rate: 1/s) so the TUI stays readable:

motel run --endpoint localhost:4317 --protocol grpc --duration 5m topology.yaml

In a third terminal, use otel-cli to instrument a real command. The distinct service name my-deploy-script makes it easy to spot among motel's synthetic spans:

otel-cli exec \
  --service my-deploy-script \
  --name "curl homepage" \
  --endpoint localhost:4317 \
  -- curl -sS https://example.com -o /dev/null

Look for the my-deploy-script span in the TUI — it appears alongside motel's synthetic spans from your topology.

Limitations

otel-cli's JSON server output is not compatible with motel import. The otel-cli server json command writes individual protobuf-marshaled spans in a directory tree ({traceId}/{spanId}/span.json). motel's import command expects either stdouttrace format (one JSON span per line) or OTLP JSON (resourceSpans arrays). You cannot pipe otel-cli's JSON output directly into motel import.

If you want to capture real traces and import them into motel, use a collector with a file exporter configured to write OTLP JSON, then feed that output to motel import.

Further reading

Validate a Collector Pipeline

This guide covers using motel to verify that an OpenTelemetry Collector accepts, processes, and forwards telemetry correctly. It works for any collector configuration — whether you are setting up a new pipeline, debugging a broken one, or testing changes before deployment.

Prerequisites

  • motel installed
  • A topology file (even a minimal one works — see Model your services)
  • An OpenTelemetry Collector running and reachable from your machine

1. Establish a baseline with stdout

Before involving the network, confirm that motel generates the traces you expect:

motel run --stdout --duration 5s topology.yaml | head -20

If this produces JSON spans, motel and your topology are working. Any problems you encounter later are between motel and the collector.

2. Point motel at your collector

Send traffic to the collector's OTLP receiver:

motel run --endpoint localhost:4318 --duration 10s topology.yaml

By default motel uses HTTP/protobuf on port 4318. If your collector listens on a different port or protocol, adjust accordingly:

# gRPC (typically port 4317)
motel run --endpoint localhost:4317 --protocol grpc --duration 10s topology.yaml

# HTTP/protobuf on a non-standard port
motel run --endpoint localhost:9090 --protocol http/protobuf --duration 10s topology.yaml

A clean run with no errors means the collector accepted every export request.

3. Isolate connection problems

When motel reports errors, narrow down the cause by switching one variable at a time.

Protocol mismatch

If you see errors like 405 Method Not Allowed or unexpected EOF, you may be sending gRPC to an HTTP receiver or vice versa. Try the other protocol:

# If http/protobuf fails, try gRPC
motel run --endpoint localhost:4317 --protocol grpc --duration 5s topology.yaml

# If gRPC fails, try http/protobuf
motel run --endpoint localhost:4318 --protocol http/protobuf --duration 5s topology.yaml

TLS errors

Errors mentioning tls: first record does not look like a TLS handshake or certificate signed by unknown authority indicate a TLS mismatch. Check whether your collector expects TLS and whether the endpoint URL matches.

Connection refused or timeout

DEADLINE_EXCEEDED or connection refused means motel cannot reach the collector at all. Verify:

  • The collector process is running
  • The port is correct and not blocked by a firewall
  • The hostname resolves from the machine running motel

A quick connectivity check:

curl -v http://localhost:4318/v1/traces

You should get a response (even an error like 405 or 400) rather than a connection failure.

4. Verify end-to-end pipeline flow

Confirming that the collector accepts traffic is only half the picture. You also need to verify that spans reach your backend.

Step 1: Send a short, identifiable burst

Use a minimal topology with a distinctive service name so spans are easy to find:

version: 1

services:
  pipeline-test:
    operations:
      validate:
        duration: 10ms

traffic:
  rate: 5/s
motel run --endpoint localhost:4318 --duration 10s pipeline-test.yaml

Step 2: Check your backend

Search your tracing backend for spans from the pipeline-test service within the last minute. If they appear, your full pipeline — motel to collector to backend — is working.

If spans do not appear:

  • Check the collector's own logs for export errors
  • Verify the collector's exporter configuration (endpoint, authentication, TLS)
  • Add a debug exporter to the collector pipeline temporarily to confirm spans arrive at the collector

Step 3: Test other signals

If your pipeline carries metrics or logs as well as traces, verify those separately:

# Metrics only
motel run --endpoint localhost:4318 --signals metrics --duration 10s topology.yaml

# All signals
motel run --endpoint localhost:4318 --signals traces,metrics,logs --duration 10s topology.yaml

Common failure modes

Symptom Likely cause What to try
connection refused Collector not running or wrong port Check collector process and port
DEADLINE_EXCEEDED Network timeout, firewall, or DNS Verify connectivity with curl
405 Method Not Allowed Protocol mismatch (gRPC vs HTTP) Switch --protocol
certificate signed by unknown authority TLS certificate not trusted Check TLS configuration
tls: first record does not look like a TLS handshake Sending TLS to a non-TLS endpoint Use the correct scheme
Motel succeeds but spans missing in backend Collector pipeline misconfigured Check collector logs and exporter config

Further reading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment