Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save jeje-andal/7561b99c8c04d0686f4103c07c504c04 to your computer and use it in GitHub Desktop.

Select an option

Save jeje-andal/7561b99c8c04d0686f4103c07c504c04 to your computer and use it in GitHub Desktop.
PR Review: AK.Server #163 - Distributed Tracing/OpenTelemetry Implementation

PR Review Metadata

Field Value
File C:\Documents\Notes\AI Answer\PR-Review-Gitea-AKServer-AKServer-163-20250316-154500.md
PR https://git.andalsoftware.com/Andal.Kharisma/AK.Server/pulls/163
Title general/mvp6.3/tracing_id
Author Mahdi
Decision REQUEST CHANGES
Files Changed 23 files
Risk Level MEDIUM-HIGH

Use this path for next step: C:\Documents\Notes\AI Answer\PR-Review-Gitea-AKServer-AKServer-163-20250316-154500.md


PR Review Report: Distributed Tracing/OpenTelemetry Implementation

Executive Summary

This PR implements distributed tracing across the AK.Server platform using OpenTelemetry and W3C Trace Context. The changes span three stacks (Attendance, Orchestrator, Persistence) and enable end-to-end trace propagation through background tasks, Dapr workflows, and HTTP requests.

Overall Assessment: The implementation is architecturally sound with proper W3C standard compliance. However, there are critical gaps in test coverage and potential PII exposure in trace tags that must be addressed before merge.


Stack Analysis

1. Attendance Stack (4 Files)

Files Changed:

  • src/Attendance/GraphQL/appsettings.Development.json
  • src/Attendance/Infrastructure/QueueChannel/BackgroundTaskQueueService.cs
  • src/Attendance/Infrastructure/QueueChannel/QueuedHostedService.cs
  • src/Attendance/Infrastructure/QueueChannel/WorkItem.cs

Analysis:

  • Proper implementation of ActivityContext capture in BackgroundTaskQueueService
  • Correct restoration of trace context in QueuedHostedService using ActivitySource.StartActivity() with parent context
  • Issue: Activity creation happens AFTER the log message, missing trace context for the initial log

Code Quality: GOOD

2. Orchestrator Stack (10 Files)

Files Changed:

  • Workflow Activities (6): FinalizeProcess, GetCustomerData, ProcessAttendance, ProcessHRMS, ProcessPayroll, SendSSE
  • src/Orchestrator/GraphQL/Program.cs
  • Service implementations (3): DateProcessingTool, HealthCheck, QueryGraphQL
  • appsettings files

Analysis:

  • Consistent use of QueueTracePropagator.StartQueueActivity() across activities
  • Proper exception recording with QueueTracePropagator.RecordException()
  • Dapr configuration improvements with environment variable support
  • Issue: Inline middleware for response headers should be extracted to a dedicated class

Code Quality: GOOD

3. Persistence Stack (9 Files)

Files Changed:

  • src/Persistence/Infrastructure/Tracing/DaprTracePropagationHandler.cs (NEW)
  • src/Persistence/Infrastructure/Tracing/GraphQLTracingInterceptor.cs
  • src/Persistence/Infrastructure/Tracing/QueueTracePropagator.cs
  • src/Persistence/Infrastructure/Tracing/W3CTraceContextMiddleware.cs
  • src/Persistence/Helper/ConstantData.cs
  • src/Persistence/Helper/PostGraphQLRequest.cs
  • src/Persistence/Infrastructure/Workflow/WorkflowInput.cs

Analysis:

  • CRITICAL FIX: QueueTracePropagator.GetTraceParent() now generates proper W3C format (00-{traceId}-{spanId}-{flags}) instead of just returning activity.Id
  • New DaprTracePropagationHandler correctly propagates trace context to Dapr sidecar
  • Improved error handling in W3CTraceContextMiddleware with try/finally pattern
  • BREAKING CHANGE: Configuration schema changed from "Otlp" to "OpenTelemetry"

Code Quality: VERY GOOD


Cross-Stack Impact Analysis

Dependency Graph

┌─────────────────────────────────────────────────────────────────┐
│                     CROSS-STACK DEPENDENCIES                    │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌──────────────┐         ┌──────────────────┐                 │
│  │  Attendance  │────────▶│   Persistence    │                 │
│  │  (Queue)     │         │  (QueueTrace     │                 │
│  │              │         │   Propagator)    │                 │
│  └──────────────┘         └──────────────────┘                 │
│           │                         ▲                          │
│           │                         │                          │
│           ▼                         │                          │
│  ┌──────────────┐                   │                          │
│  │ Orchestrator │───────────────────┘                          │
│  │ (Workflows)  │                                              │
│  └──────────────┘                                              │
│                                                                 │
│  Shared Infrastructure:                                         │
│  • W3CTraceContextMiddleware                                    │
│  • GraphQLTracingInterceptor                                    │
│  • DaprTracePropagationHandler                                  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Breaking Changes Assessment

Change Severity Impact Mitigation
WorkItem.ActivityContext property NONE Non-breaking (nullable) N/A
WorkflowInput.TraceParent property NONE Non-breaking (default null) N/A
traceparent format (W3C standard) MEDIUM External integrations may expect old format Document change, verify consumers
appsettings schema (Otlp → OpenTelemetry) HIGH Deployment scripts may break Update deployment configs

Security Review

PII Exposure in Trace Tags (HIGH SEVERITY)

Issue: Customer IDs and user IDs are being added to trace tags, which may be exported to external observability platforms.

Affected Locations:

// FinalizeProcessActivity.cs, GetCustomerDataActivity.cs, etc.
activity.SetTag("customer.id", input.customerId);

// QueuedHostedService.cs
activity.SetTag("workItem.customerId", workItem.CustomerId);

// GraphQLTracingInterceptor.cs
activity.SetTag("user.id", userId);
activity.SetTag("tenant.id", tenantId);

Risk:

  • Customer/tenant identifiers exposed in distributed traces
  • Potential compliance issues (GDPR, data privacy)
  • Traces may persist in external systems (Jaeger, Zipkin, etc.)

Recommendation:

  1. Hash or anonymize identifiers before adding to tags
  2. Or use a configuration flag to disable PII tagging in production
  3. Document which tags contain PII for data retention policies

Security Positives

  • Proper W3C Trace Context implementation
  • No hardcoded secrets or credentials
  • Environment variable configuration for sensitive endpoints
  • No header injection vulnerabilities detected

Test Coverage Assessment

Critical Gap: No Tests for Tracing Infrastructure

Missing Test Coverage:

Component Test Type Priority
QueueTracePropagator Unit Tests CRITICAL
DaprTracePropagationHandler Unit Tests CRITICAL
W3CTraceContextMiddleware Unit/Integration Tests CRITICAL
GraphQLTracingInterceptor Unit Tests CRITICAL
BackgroundTaskQueueService trace capture Integration Tests HIGH
Cross-service trace propagation Integration Tests HIGH

Required Test Cases:

  1. W3C Format Validation:

    // Verify format: 00-{traceId}-{spanId}-{flags}
    var traceParent = QueueTracePropagator.GetTraceParent();
    Assert.Matches(@"^00-[0-9a-f]{32}-[0-9a-f]{16}-(00|01)$", traceParent);
  2. ActivityContext Parsing:

    // Verify ParseTraceParent handles valid/invalid inputs
  3. Trace Propagation Chain:

    // Verify trace ID is preserved across HTTP calls

Prioritized Recommendations

CRITICAL (Must Fix Before Merge)

  1. Add Unit Tests for Tracing Infrastructure

    • Create Persistence.UnitTest/Infrastructure/Tracing/QueueTracePropagatorTests.cs
    • Create Persistence.UnitTest/Infrastructure/Tracing/DaprTracePropagationHandlerTests.cs
    • Minimum 80% coverage for new tracing code
  2. Address PII in Trace Tags

    // Option A: Hash the customer ID
    activity.SetTag("customer.id.hash", ComputeHash(input.customerId));
    
    // Option B: Configuration-driven
    if (_tracingOptions.IncludePiiTags)
    {
        activity.SetTag("customer.id", input.customerId);
    }
  3. Verify Configuration Migration

    • Update deployment scripts to use new "OpenTelemetry" schema
    • Ensure backward compatibility or migration path

HIGH (Should Fix)

  1. Extract Inline Middleware to Class

    // Current: Inline middleware in Program.cs
    // Recommended: Create TraceResponseHeadersMiddleware class
  2. Fix Activity Timing in QueuedHostedService

    // Move activity creation BEFORE the log message
    using var activity = workItem.ParentActivityContext.HasValue
        ? ActivitySource.StartActivity(...)
        : ActivitySource.StartActivity(...);
    
    _logger.LogInformation($"{workItem.Message} processed!");

MEDIUM (Nice to Have)

  1. Add Sampling Configuration Documentation

    • Document how to configure sampling rate in production
    • Add performance impact notes
  2. Consider Trace Correlation IDs

    • Add custom correlation ID for easier debugging
    • Propagate through all logs

Quality Metrics Dashboard

Metric Score Notes
Code Quality 85/100 Clean implementation, proper patterns
Test Coverage 0/100 NO TESTS ADDED - Critical gap
Security 70/100 PII exposure in traces
Documentation 60/100 XML docs present, but no architecture docs
Breaking Changes 75/100 Config schema change documented

Quality Score Calculation

Base Score: 100
- Critical Issues (no tests): -20
- High Issues (PII exposure): -10
- Medium Issues (config breaking): -5

Final Score: 65/100 (ACCEPTABLE WITH ISSUES)

Detailed Code Review Comments

File: src/Persistence/Infrastructure/Tracing/QueueTracePropagator.cs

Line 15-24: Good improvement to generate proper W3C format.

Before:

return activity?.Id;  // Non-standard format

After:

var traceId = activity.TraceId.ToString();    // 32 hex chars
var spanId = activity.SpanId.ToString();      // 16 hex chars (W3C compliant)
var flags = (activity.ActivityTraceFlags & ActivityTraceFlags.Recorded) != 0 ? "01" : "00";
return $"00-{traceId}-{spanId}-{flags}";  // W3C standard

File: src/Attendance/Infrastructure/QueueChannel/QueuedHostedService.cs

Lines 76-86: Activity creation should happen before logging.

// CURRENT (Issue):
_logger.LogInformation($"{workItem.Message} processed!");  // No trace context
using var activity = ...;  // Activity created after log

// RECOMMENDED:
using var activity = ...;  // Create activity first
_logger.LogInformation($"{workItem.Message} processed!");  // Now has trace context

File: src/Orchestrator/GraphQL/Program.cs

Lines 398-411: Inline middleware should be extracted.

// CURRENT: Inline middleware
app.Use(async (context, next) =>
{
    await next();
    // ... header logic
});

// RECOMMENDED: Dedicated middleware class
app.UseMiddleware<TraceResponseHeadersMiddleware>();

Next Steps

Before Merge:

  1. Add unit tests for QueueTracePropagator
  2. Add unit tests for DaprTracePropagationHandler
  3. Add integration tests for trace propagation
  4. Address PII exposure (hash or configure)
  5. Update deployment configuration for new appsettings schema
  6. Fix activity timing in QueuedHostedService

After Merge:

  1. Monitor trace volume in production
  2. Configure appropriate sampling rates
  3. Document troubleshooting guide for distributed traces
  4. Set up alerts for trace export failures

Conclusion

This PR implements a solid foundation for distributed tracing with proper W3C standard compliance. The cross-stack integration is well-designed and the breaking changes are manageable. However, the lack of test coverage and PII exposure in trace tags are blockers that must be resolved before merge.

Recommendation: REQUEST CHANGES - Address critical issues and re-review.


Review completed: 2026-03-16 Reviewer: Claude Code - Multi-Stack PR Orchestrator v5.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment