mrpollo/flight-review-next-plan.md

## flight-review-next-plan.md

      
    Raw
  

              flight-review-next-plan.md
            
          
    Flight Review Next: Architecture & Implementation Plan

Executive Summary

Flight Review (logs.px4.io) has served the PX4 community for nearly 10 years since its first commit in October 2016. Built on Tornado/Bokeh/SQLite with an S3 FUSE mount, it processes 350k+ ULog flight logs but suffers from significant infrastructure pain: every page view re-parses the raw ULog file to regenerate all 35+ Bokeh plots, the S3 FUSE mount is fragile, the SQLite database locks under concurrent access, and the Python/Bokeh stack makes the frontend difficult to extend. This document proposes a ground-up replacement designed for performance, extensibility, and operational simplicity at any scale.

Current State Assessment

What Exists Today


Age: 9.5 years, 664 commits, actively maintained but receiving only reliability fixes
Stack: Python 3.11, Tornado (async web server), Bokeh 3.8.2 (plotting), SQLite (WAL mode), Bootstrap 5, Leaflet (maps), Cesium.js (3D)
Infrastructure: Single EC2 instance (15GB RAM, 19GB disk), nginx reverse proxy, S3 FUSE mount at /data_s3, 2 Tornado worker processes
Database: SQLite at ~219MB storing only scalar metadata (14 fields per log in LogsGenerated table). No time-series data stored.
Processing model: Every page view triggers a full ULog parse (~1-5s for typical logs) and regeneration of all plots. An in-memory LRU cache of parsed ULog objects provides some relief.
ULog topics loaded: ~45 of 100+ available topics per log file
Plots generated: 35+ Bokeh plots including time-series, FFT spectrograms, PSD analysis, GPS maps, parameter tables, and system diagnostics
S3 storage: ~350k ULog files in s3://px4-flight-review/flight_review/log_files/

Key Pain Points


Performance: Re-parsing ULog files on every page view is the #1 bottleneck. A 50MB log takes 1-5 seconds to parse in Python before any plots render.
S3 FUSE mount: s3fs is fragile, adds latency, creates kernel-level failure modes, and doesn't support concurrent access patterns well.
SQLite concurrency: Single-writer limitation causes lock contention with 2 worker processes (mitigated by WAL mode but not eliminated).
Bokeh server-side rendering: Plots are generated server-side in Python, creating CPU-heavy page loads and making the frontend hard to extend.
Monolithic architecture: Upload, processing, storage, and visualization are tightly coupled in a single process.
No caching layer: Computed plot data is discarded after each request (aside from in-memory ULog cache).
Single-instance deployment: No horizontal scaling, no container support, manual deployment via shell script.
No authentication: The public instance is fully open; no mechanism for private deployments.

What Works Well (Keep These)


ULog processing pipeline: The analysis logic (35+ plot types, PID analysis, FFT/PSD spectrograms, vibration analysis) represents years of domain expertise
S3 as primary storage: Object storage for raw logs is the right pattern
Browse/search/statistics pages: The metadata-driven browse experience is useful
CloudFront CDN: Recently added for /dbinfo endpoint, works well
Overview image generation: Static PNG map thumbnails for browse page


Design Principles


Process once, serve many: Parse and analyze each ULog file exactly once at upload time. Store results. Never re-parse for viewing.
S3 API, not FUSE: Use the S3 SDK directly for all object storage operations. Pre-signed URLs for client-side downloads.
Client-side rendering: Ship processed data to the browser; let the client render plots. Server serves JSON, not HTML.
Plugin-friendly visualization: Make it trivial to add new plot types, key facts, and analysis modules without touching core code.
Scale-agnostic deployment: Same codebase runs as a single Docker container for a team of 5 or as a distributed service for 350k+ logs.
Offline-capable processing: The ULog processing engine should work as a standalone CLI tool, not just as part of the web service.


Recommended Architecture

                                    ┌─────────────────────┐
                                    │   CloudFront CDN    │
                                    │  (static assets +   │
                                    │   pre-signed S3)    │
                                    └─────────┬───────────┘
                                              │
┌──────────────┐                   ┌──────────▼──────────┐
│   Browser    │◄──────────────────│    API Gateway /    │
│              │    REST + WS      │    Reverse Proxy    │
│  SPA Client  │──────────────────►│      (nginx)        │
└──────────────┘                   └──────────┬───────────┘
                                              │
                          ┌───────────────────┼───────────────────┐
                          │                   │                   │
                ┌─────────▼────────┐ ┌───────▼────────┐ ┌───────▼────────┐
                │   API Service    │ │  Upload Worker  │ │  Processing    │
                │  (Rust / Axum)   │ │  (async upload  │ │  Worker(s)     │
                │                  │ │   + S3 put)     │ │  (ULog parse   │
                │  - Auth          │ │                 │ │   + analysis)  │
                │  - Browse/Search │ └───────┬─────────┘ └───────┬────────┘
                │  - Serve plot    │         │                   │
                │    data (JSON)   │         │                   │
                │  - Pre-signed    │    ┌────▼───────────────────▼────┐
                │    S3 URLs       │    │         S3 Bucket           │
                └─────────┬────────┘    │                            │
                          │             │  /raw/{id}.ulg             │
                          │             │  /processed/{id}.json.zst  │
                          │             │  /thumbnails/{id}.png      │
                     ┌────▼─────┐       │  /cache/{id}/plots.json   │
                     │PostgreSQL│       └────────────────────────────┘
                     │          │
                     │ - Logs   │
                     │ - Metadata│
                     │ - Summary│
                     │   stats  │
                     │ - Users  │
                     │ - Tokens │
                     └──────────┘

Component Breakdown

1. Backend: Rust with Axum

Why Rust: ULog parsing is CPU-bound binary processing -- exactly where Rust excels. A Rust backend can parse a 50MB ULog file in ~100-500ms vs 1-5s in Python (10x improvement). The single-binary deployment model eliminates Python dependency management headaches. Axum provides async HTTP with tower middleware for auth, rate limiting, and observability.
Why not other options:

Python (FastAPI): Would perpetuate the parsing performance problem. Fine for the API layer but not for processing.
Go: Good alternative, but lacks the zero-cost abstractions that make binary parsing ergonomic. No existing ULog parser ecosystem.
Node.js: Wrong tool for CPU-bound binary processing.

Key crates:

axum -- HTTP framework
aws-sdk-s3 -- S3 API (native, not FUSE)
sqlx -- PostgreSQL async driver
serde / serde_json -- serialization
tokio -- async runtime
ULog parser: Either port pyulog to Rust, extend ulog-rs/yule_log, or write a new one using mavsim-viewer's C parser as reference (439 LOC, clean architecture, already handles all message types)

API surface:
POST   /api/upload              -- Upload ULog file (multipart)
GET    /api/logs                -- Browse/search/filter logs
GET    /api/logs/{id}           -- Log metadata + summary stats
GET    /api/logs/{id}/plots     -- Pre-computed plot data (JSON)
GET    /api/logs/{id}/download  -- Pre-signed S3 URL for raw .ulg
GET    /api/logs/{id}/pid       -- PID analysis data
GET    /api/logs/{id}/3d        -- 3D trajectory data
GET    /api/stats               -- Aggregate statistics
POST   /api/auth/login          -- Authentication (optional)
GET    /api/health              -- Health check

2. Database: PostgreSQL

Why PostgreSQL over alternatives:

Not TimescaleDB/InfluxDB/ClickHouse: We are NOT storing time-series data in the database. The raw ULog stays in S3, and processed plot data goes to S3 as compressed JSON. The database stores only scalar metadata, summary statistics, and user/auth data -- a perfect fit for vanilla PostgreSQL.
Not SQLite: Concurrent write access from multiple workers, proper connection pooling, full-text search, JSONB for flexible metadata, and proven horizontal scaling path.
Not DuckDB: Interesting for analytical queries but overkill for metadata storage. Could be used as an embedded engine in the processing worker for on-the-fly analysis, but PostgreSQL covers the primary need.

Schema (simplified):
CREATE TABLE logs (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    -- Upload metadata
    title           TEXT,
    description     TEXT,
    original_filename TEXT,
    uploaded_at     TIMESTAMPTZ DEFAULT NOW(),
    source          TEXT,        -- "webui", "qgc", "cli"
    email           TEXT,
    public          BOOLEAN DEFAULT TRUE,
    allow_analysis  BOOLEAN DEFAULT TRUE,
    -- User-provided context
    wind_speed      SMALLINT,
    rating          TEXT,
    feedback        TEXT,
    video_url       TEXT,
    error_labels    TEXT,
    -- Processing status
    status          TEXT DEFAULT 'pending',  -- pending, processing, ready, failed
    processed_at    TIMESTAMPTZ,
    -- S3 references
    s3_raw_key      TEXT NOT NULL,
    s3_processed_key TEXT,
    s3_thumbnail_key TEXT,
    -- Access control
    token           TEXT UNIQUE,
    owner_id        UUID REFERENCES users(id)
);

CREATE TABLE log_metadata (
    log_id          UUID PRIMARY KEY REFERENCES logs(id),
    -- Extracted from ULog (computed once at processing time)
    duration_s      INTEGER,
    mav_type        TEXT,
    estimator       TEXT,
    autostart_id    INTEGER,
    hardware        TEXT,
    software_version TEXT,
    software_hash   TEXT,
    vehicle_uuid    TEXT,
    start_time_utc  TIMESTAMPTZ,
    -- Error/warning counts
    num_errors      INTEGER DEFAULT 0,
    num_warnings    INTEGER DEFAULT 0,
    has_hardfault   BOOLEAN DEFAULT FALSE,
    file_corrupted  BOOLEAN DEFAULT FALSE,
    -- Flight modes
    flight_modes    JSONB,       -- [{mode: "POSCTL", duration_s: 120}, ...]
    -- Summary statistics (previously computed on every page view)
    total_distance_m    REAL,
    max_altitude_diff_m REAL,
    avg_speed_ms        REAL,
    max_speed_ms        REAL,
    max_speed_horiz_ms  REAL,
    max_speed_up_ms     REAL,
    max_speed_down_ms   REAL,
    max_tilt_deg        REAL,
    max_rotation_dps    REAL,
    avg_current_a       REAL,
    max_current_a       REAL,
    -- Vibration summary
    max_vibe_level      REAL,
    vibe_status         TEXT,    -- "good", "warning", "critical"
    -- GPS quality summary
    avg_satellites      REAL,
    min_fix_type        SMALLINT,
    -- Dropout summary
    dropout_count       INTEGER,
    dropout_total_ms    INTEGER,
    -- Searchable metadata (JSONB for flexibility)
    parameters          JSONB,   -- Non-default parameters
    info_messages       JSONB,   -- Key info messages
    -- Full-text search
    search_vector       TSVECTOR
);

CREATE TABLE vehicles (
    uuid            TEXT PRIMARY KEY,
    name            TEXT,
    total_flight_time_s BIGINT DEFAULT 0,
    latest_log_id   UUID REFERENCES logs(id)
);

CREATE TABLE users (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email           TEXT UNIQUE,
    password_hash   TEXT,
    role            TEXT DEFAULT 'user',  -- user, admin
    created_at      TIMESTAMPTZ DEFAULT NOW()
);
3. Storage: S3 via API

Upload flow:

Client requests a pre-signed upload URL from the API
Client uploads directly to S3 (bypasses backend for large files)
Client notifies API that upload is complete
API enqueues processing job

For small deployments: MinIO provides S3-compatible object storage that runs alongside the app in a single Docker Compose setup.
Storage layout:
s3://bucket/
├── raw/
│   └── {log_id}.ulg                    # Original ULog file (immutable)
├── processed/
│   └── {log_id}.json.zst               # All plot data, compressed (~200-500KB)
├── thumbnails/
│   └── {log_id}.png                    # Overview map image
└── exports/
    └── {log_id}.kml                    # Optional KML export

Processed data format (the key innovation -- compute once, serve forever):
{
  "version": 1,
  "log_id": "abc-123",
  "computed_at": "2026-03-11T22:00:00Z",
  "plots": {
    "attitude_roll": {
      "type": "timeseries",
      "title": "Roll Angle",
      "unit": "deg",
      "series": [
        {"label": "Estimated", "timestamps": [...], "values": [...], "color": "#1f77b4"},
        {"label": "Setpoint", "timestamps": [...], "values": [...], "color": "#ff7f0e"}
      ],
      "flight_modes": [{"start": 0.0, "end": 12.5, "mode": "MANUAL"}, ...],
      "annotations": [{"time": 5.2, "text": "Param change: MC_ROLL_P=6.5"}]
    },
    "fft_actuator_controls": {
      "type": "spectrogram",
      "title": "Actuator Controls FFT",
      "frequencies": [...],
      "magnitudes": [...],
      "markers": [{"freq": 80, "label": "MC_DTERM_CUTOFF"}]
    },
    "gps_track": {
      "type": "map",
      "coordinates": [[lat, lon, alt], ...],
      "flight_mode_segments": [...]
    },
    "trajectory_3d": {
      "type": "trajectory",
      "positions": [[x, y, z], ...],
      "quaternions": [[w, x, y, z], ...],
      "timestamps": [...],
      "vehicle_type": "quadrotor"
    }
  },
  "key_facts": {
    "vibration": {"status": "good", "max_level": 3.2, "unit": "m/s^2"},
    "gps_quality": {"avg_sats": 14, "min_fix": 3},
    "battery": {"voltage_start": 16.2, "voltage_end": 14.8, "mah_used": 1200},
    "flight_modes": [{"mode": "POSCTL", "duration_s": 120, "pct": 80}],
    "errors": [],
    "warnings": ["High vibration on IMU 2"]
  },
  "tables": {
    "parameters": [...],
    "messages": [...],
    "perf_counters": [...]
  }
}
This pre-computed JSON eliminates the need to ever re-parse the ULog for viewing. At ~200-500KB compressed (vs 2-90MB raw ULog), it's fast to fetch and cheap to store. For 350k logs, total cache size would be ~70-175GB -- trivial for S3.
4. Frontend: React + TypeScript SPA

Why React: Largest ecosystem for data visualization components, strong TypeScript support, and the most contributors will be familiar with it. Vue or Svelte are viable alternatives but React maximizes contributor pool for an open-source project.
Charting: Apache ECharts
After evaluating options:


Library
Bundle Size
Max Points
GPU Accel
Extensibility
Community


uPlot
45KB
10M+
No (Canvas2D)
Low
Small


Apache ECharts
300KB (tree-shakeable)
10M+
Yes (WebGL)
High
Very large


Plotly.js
3.5MB
100K
Limited
Medium
Large


D3
90KB
Varies
Manual
Very high
Very large


Apache ECharts wins because:

WebGL renderer handles millions of points smoothly (critical for high-rate IMU/FIFO data)
Built-in support for linked/synchronized time axes across multiple plots
Native support for spectrograms, heatmaps, and scatter plots
Large-array optimization with sampling and progressive rendering
Extensive theming and customization
Tree-shakeable: only import the chart types you use
Active development with strong community (Apache Foundation)

uPlot is faster and lighter but lacks spectrogram support and has limited extensibility. Plotly.js is too heavy and struggles with >100K points.
3D Flight Replay: Three.js Web Component
Drawing from mavsim-viewer's architecture (which cleanly separates data processing from rendering):

Port mavsim-viewer's ULog replay engine logic (~470 LOC) to TypeScript
Use Three.js for 3D rendering (most mature WebGL library)
Port the vehicle model registry (8 models across 6 types) as glTF assets
Implement the dead-reckoning interpolation for smooth 60fps playback
Trail rendering with speed-based coloring (ring buffer, adaptive sampling)
Chase camera + FPV camera modes
Expose as a <flight-replay> web component or React component

The mavsim-viewer C codebase provides exact specifications for:

Coordinate transforms (NED to rendering frame)
Quaternion handling and interpolation
Flight mode transition tracking (up to 256 changes)
Playback controls (0.25x to 16x speed, seek, loop)
Trail sampling parameters (1800 points, 16ms interval, 1cm distance threshold)

5. Plugin / Extension System

Borrowing from PlotJuggler's architecture (which supports 20+ plugins across data loaders, transforms, and visualizers), the frontend should support a simple plugin registry:
// Plugin definition
interface FlightReviewPlugin {
  id: string;
  name: string;
  version: string;
  // What data this plugin needs from the processed JSON
  requiredPlots?: string[];
  requiredKeyFacts?: string[];
  // Components
  panels?: PanelPlugin[];        // Full panel in the plot area
  keyFacts?: KeyFactPlugin[];    // Cards in the summary section
  transforms?: TransformPlugin[]; // Client-side data transforms
}

interface PanelPlugin {
  id: string;
  title: string;
  category: string;  // "attitude", "position", "power", "sensors", "custom"
  component: React.ComponentType<{data: PlotData, config: any}>;
  // Optional: server-side processing hint
  processorId?: string;
}

interface KeyFactPlugin {
  id: string;
  title: string;
  component: React.ComponentType<{facts: KeyFacts}>;
  priority: number;  // Display order
}

// Registration
registerPlugin({
  id: "vibration-analysis",
  name: "Vibration Analysis",
  panels: [{
    id: "vibe-spectrum",
    title: "Vibration Spectrum",
    category: "sensors",
    component: VibrationSpectrumPanel,
  }],
  keyFacts: [{
    id: "vibe-summary",
    title: "Vibration Health",
    component: VibrationSummaryCard,
    priority: 10,
  }],
});
For the backend processing pipeline, a similar plugin system allows adding new analysis modules:
// Backend processing plugin trait
trait AnalysisPlugin: Send + Sync {
    fn id(&self) -> &str;
    fn name(&self) -> &str;
    /// Which ULog topics this plugin needs
    fn required_topics(&self) -> &[&str];
    /// Process ULog data and return plot data + key facts
    fn process(&self, ulog: &ParsedULog) -> Result<PluginOutput>;
}

struct PluginOutput {
    plots: HashMap<String, PlotData>,
    key_facts: HashMap<String, serde_json::Value>,
    tables: HashMap<String, TableData>,
}
Built-in plugins would cover all current flight review functionality (attitude, position, power, sensors, FFT, PID analysis, etc.), and the community could add new ones without modifying core code.
6. Authentication & Multi-tenancy

For the public Dronecode instance: Anonymous uploads continue as today, with optional user accounts for managing your own logs.
For private deployments: Simple auth with configurable backends:
# config.yaml
auth:
  enabled: true
  provider: "local"           # local, oidc, ldap
  require_login_to_view: true
  require_login_to_upload: true
  # For OIDC (Google, GitHub, Okta, etc.)
  oidc:
    issuer: "https://accounts.google.com"
    client_id: "..."
    client_secret: "..."
Implementation: JWT-based session tokens. The users table is optional -- when auth is disabled, the system behaves exactly like today's public instance.
7. Deployment

Single-container deployment (small teams):
# docker-compose.yml
services:
  flight-review:
    image: ghcr.io/px4/flight-review-next:latest
    ports:
      - "8080:8080"
    environment:
      DATABASE_URL: "postgres://fr:fr@db/flight_review"
      S3_ENDPOINT: "http://minio:9000"
      S3_BUCKET: "flight-review"
      S3_ACCESS_KEY: "minioadmin"
      S3_SECRET_KEY: "minioadmin"
    depends_on:
      - db
      - minio

  db:
    image: postgres:16-alpine
    volumes:
      - pgdata:/var/lib/postgresql/data
    environment:
      POSTGRES_DB: flight_review
      POSTGRES_USER: fr
      POSTGRES_PASSWORD: fr

  minio:
    image: minio/minio
    command: server /data
    volumes:
      - s3data:/data

volumes:
  pgdata:
  s3data:
One docker compose up and you have a fully functional private instance. No nginx, no FUSE mounts, no shell scripts.
Production deployment (Dronecode scale):

Same containers, but PostgreSQL on RDS, S3 on AWS, and multiple API/worker replicas behind an ALB
Horizontal scaling: add more processing workers for upload bursts
CloudFront for static assets and pre-signed S3 URLs
Optional: Redis/SQS for job queue (or use PostgreSQL LISTEN/NOTIFY for simplicity)


Processing Pipeline Detail

Upload Flow

Client                    API                     S3                  Worker
  │                        │                      │                    │
  │── POST /api/upload ───►│                      │                    │
  │◄── presigned URL ──────│                      │                    │
  │                        │                      │                    │
  │── PUT (direct S3) ────────────────────────────►│                    │
  │                        │                      │                    │
  │── POST /api/upload/complete ──────────────────►│                    │
  │   {s3_key, metadata}   │                      │                    │
  │                        │── INSERT log ────────►│ (PostgreSQL)       │
  │                        │── enqueue job ───────────────────────────►│
  │◄── 202 {log_id} ──────│                      │                    │
  │                        │                      │                    │
  │                        │                      │    ┌───────────┐   │
  │                        │                      │    │ Download   │   │
  │                        │                      │◄───│ raw .ulg   │   │
  │                        │                      │    │            │   │
  │                        │                      │    │ Parse ULog │   │
  │                        │                      │    │            │   │
  │                        │                      │    │ Run all    │   │
  │                        │                      │    │ analysis   │   │
  │                        │                      │    │ plugins    │   │
  │                        │                      │    │            │   │
  │                        │                      │◄───│ Upload     │   │
  │                        │                      │    │ processed  │   │
  │                        │                      │    │ JSON + PNG │   │
  │                        │                      │    └───────────┘   │
  │                        │◄── UPDATE status='ready' ────────────────│
  │                        │                      │                    │

Processing Steps (per log)


Download raw ULog from S3 (~2-90MB, up to ~2.7GB for extreme cases like 15-hour flights)
Parse ULog header, definitions, subscriptions (streaming parser for large files)
Extract metadata: vehicle type, hardware, software, parameters, info messages
Compute summary statistics: distance, speed, altitude, tilt, current, vibration levels
Generate time-series plot data: For each of the ~35 plot types, extract the relevant topic data, apply transforms (unit conversion, filtering, FFT), and produce downsampled series at multiple resolution tiers
Generate spectrogram data: FFT/PSD for actuator controls, angular velocity, angular acceleration
Generate map data: GPS coordinates with flight mode segments
Generate 3D trajectory data: Positions, quaternions, timestamps for the flight replay component
Generate overview thumbnail: Static map image (can use server-side rendering or delegate to a headless browser)
Compress and upload processed JSON (zstd compression) + thumbnail to S3
Update database with metadata, summary stats, and status='ready'

Total processing time target: <5 seconds for a typical 10-minute flight log. For very long logs (1h+), <30 seconds. For extreme logs (15h), <2 minutes.
Handling Very Large Logs (Up to 15 Hours)

The largest known log in the current dataset is a 15-hour flight. At ~50 KB/s default logging rate, this produces a ~2.7 GB ULog file with ~173 million data points across ~45 topics (13.5M points per 250Hz topic). This has significant implications:
Processing worker requirements:

The Rust ULog parser MUST use streaming/mmap parsing -- loading 2.7 GB entirely into RAM is unacceptable
Processing worker memory budget: 4 GB max, enforced via configurable limit
Configurable max file size (default 5 GB) to reject pathological inputs
Large logs should be priority-queued separately to avoid blocking the worker for short uploads
Processing time scales roughly linearly: ~10-20s in Rust for a 15h log (vs 2-5 minutes in Python)

Current system has zero guards for large logs:

Nginx limit: 100 MB (client_max_body_size), Tornado buffer: 300 MB -- both would reject even a 1-hour log
No memory guards in pyulog parsing (loads everything into RAM)
LRU cache of 8 parsed ULog objects has no size-in-bytes awareness -- eight 15h logs would OOM the server
Downsampling uses naive every-Nth-sample decimation, not LTTB

Upload Flow

Important context: The majority of users upload through the web form on the website, not via QGroundControl auto-upload. The upload flow must prioritize the web UI experience:

Web form upload (primary): Multipart POST directly to the backend API. For files under ~100 MB (the vast majority of uploads), this is simple and fast. For large files (>100 MB), use chunked upload with progress indication.
QGroundControl auto-upload (secondary): Must maintain API compatibility with the current QGC upload endpoint format.
Pre-signed S3 upload (for very large files only): For files >500 MB, the API can optionally provide a pre-signed S3 URL for direct upload, bypassing the backend. This is an optimization, not the default path.

The web upload form should show:

Upload progress bar with speed and ETA
File validation (is it a valid ULog?) as soon as the header bytes arrive
Processing status ("Uploading... → Processing... → Ready") with live updates via SSE or polling
Link to the log page as soon as processing completes

Downsampling Strategy

Raw ULog data can have millions of points (e.g., sensor_combined at 250Hz for 15 hours = 13.5M points per axis). The processed JSON should contain intelligently downsampled data at multiple resolution tiers:

LTTB (Largest Triangle Three Buckets): Preserves visual shape while aggressively reducing point count. The point budget scales with log duration:

Logs up to 10 minutes: 4,000 points per series
Longer logs: min(max(4000, duration_minutes * 35), 30000) points
15-hour log: ~31,500 points per series (~5 MB per series, ~75 MB total uncompressed, ~12 MB compressed)


Hierarchical tiers (pre-computed, stored in S3):

Tier 1 (overview): LTTB-downsampled as above -- used for initial page load
Tier 2 (medium zoom): 10x the overview point count, capped at 200K per series
Tier 3 (full resolution): on-demand endpoint that reads the raw ULog for a specific time range


Full resolution on demand: For zoomed-in views, the client requests a specific time range at full resolution from a secondary endpoint. This avoids storing full-res data (which would be ~400-600 MB compressed for a 15h log) while still supporting deep inspection.
FFT/PSD data: Store at native resolution (frequency domain is already compact).
Map coordinates: Downsample to ~1000 points using Ramer-Douglas-Peucker.


Migration Strategy

Phase 1: Core Infrastructure (Months 1-2)


Rust backend with Axum: health check, upload, S3 integration (API-based)
PostgreSQL schema and migrations
ULog parser in Rust (port from pyulog/mavsim-viewer reference)
Basic processing pipeline: parse ULog, extract metadata, store to DB
Docker Compose setup with MinIO
CI/CD pipeline

Phase 2: Processing Engine (Months 2-4)


Port all 35+ plot types from configured_plots.py to Rust analysis plugins
Implement FFT/PSD analysis (use rustfft crate)
Implement summary statistics computation
Pre-computed JSON generation with LTTB downsampling
Overview thumbnail generation
Processing worker with job queue

Phase 3: Frontend MVP (Months 3-5)


React SPA with TypeScript
Browse/search page with filtering
Log detail page with all core plots (ECharts)
Synchronized time axes across plots
Flight mode background coloring
GPS map view (Leaflet or Mapbox GL)
Parameter table, logged messages
Responsive design

Phase 4: Advanced Features (Months 5-7)


3D flight replay component (Three.js, ported from mavsim-viewer)
PID analysis page
Plugin system for frontend panels and key facts
Authentication system (local + OIDC)
Full-resolution zoom endpoint
KML export
Statistics/analytics page
Dark mode

Phase 5: Production Migration (Months 7-8)


Bulk re-process existing 350k logs (parallel workers on AWS)
Data migration from SQLite to PostgreSQL
DNS cutover with nginx redirect for old URLs
Monitoring and alerting setup
Documentation and contributor guide

Parallel Workstream: Data Migration

The 350k existing logs can be re-processed in parallel. At 5 seconds per log with 10 workers, this takes ~48 hours. The migration can run alongside the old system, with a read-only bridge serving old logs until re-processing completes.

Key Design Decisions Borrowed from PlotJuggler

PlotJuggler's architecture (C++/Qt, 20+ plugins, handles millions of points at 60fps) provides several patterns worth adopting:

Lazy range computation: Don't compute min/max for all data upfront. Cache ranges and invalidate on data change. Critical for responsive zoom/pan.
Deque-based storage with dirty flags: PlotJuggler uses std::deque with lazy range caching. The web equivalent: typed arrays with cached bounds, recomputed only when the visible window changes.
Plugin architecture: PlotJuggler's DataLoader, DataStreamer, TransformFunction, and StatePublisher interfaces cleanly separate concerns. Our AnalysisPlugin (backend) and PanelPlugin (frontend) follow the same pattern.
Transform composition: PlotJuggler supports chaining transforms (derivative -> moving average -> outlier removal). ECharts supports client-side transforms, and the backend can pre-compute common ones.
Group-based organization: PlotJuggler groups related series (e.g., all IMU measurements) with shared visibility controls. The frontend should do the same.
WASM plugin potential: PlotJuggler is experimenting with WASM plugins. A future version of Flight Review Next could support user-provided WASM analysis modules that run in the browser.


Key Design Decisions Borrowed from mavsim-viewer

mavsim-viewer's clean C architecture (~5,560 LOC total) provides exact specifications for the 3D replay component:

Data source abstraction: Polymorphic data_source_t with vtable. Port directly to TypeScript abstract class with ReplayDataSource and potential LiveDataSource implementations.
Dead-reckoning interpolation: Essential for smooth 60fps playback from 5-10Hz position data. Linear interpolation: pos = pos_last + vel * dt.
Adaptive trail sampling: Ring buffer of 1800 points, sampled at 16ms intervals with 1cm minimum distance. Prevents memory bloat while maintaining visual fidelity.
Speed ribbon coloring: Trail colored by speed (blue=slow, green=medium, red=fast). Normalized against running max speed.
Seek index: Sparse timestamp index (1 entry per second) enables O(log n) seeking in large logs. Build during initial parse.
Vehicle model registry: 8 models across 6 types with per-model scale and orientation offsets. Ship as glTF assets for the web version.
Camera modes: Chase (orbit around vehicle) and FPV (vehicle-mounted gimbal). Both transfer directly to Three.js camera controls.


Frontend Component Architecture

<App>
├── <Header>
│   ├── <SearchBar>
│   └── <UserMenu>
├── <Routes>
│   ├── <BrowsePage>
│   │   ├── <FilterSidebar>
│   │   ├── <LogGrid>
│   │   │   └── <LogCard> (thumbnail, key facts, duration, vehicle type)
│   │   └── <Pagination>
│   ├── <LogDetailPage>
│   │   ├── <KeyFactsBar>
│   │   │   ├── <VibrationCard>
│   │   │   ├── <GPSQualityCard>
│   │   │   ├── <BatteryCard>
│   │   │   ├── <FlightModesCard>
│   │   │   └── <PluginKeyFactCards...>
│   │   ├── <InfoTable>
│   │   ├── <PlotContainer>
│   │   │   ├── <TimeSeriesPlot>       (ECharts, synchronized axes)
│   │   │   ├── <SpectrogramPlot>      (ECharts heatmap)
│   │   │   ├── <MapPanel>             (Leaflet/Mapbox)
│   │   │   ├── <FlightReplay3D>       (Three.js)
│   │   │   └── <PluginPanels...>
│   │   ├── <ParameterTable>
│   │   ├── <MessagesTable>
│   │   └── <CollapsibleSections>
│   │       ├── <PerfCounters>
│   │       └── <BootConsole>
│   ├── <PIDAnalysisPage>
│   ├── <StatisticsPage>
│   └── <UploadPage>
└── <Footer>

Key UX Improvements Over Current Flight Review


Instant page loads: Pre-computed data loads in <500ms vs 3-10 seconds today
Synchronized cursors: Hover on one plot, see the corresponding time on all plots and the 3D view
Key facts dashboard: At-a-glance vibration health, GPS quality, battery status, flight modes -- visible immediately without scrolling through 35 plots
Collapsible plot categories: Users see what they care about first (attitude, position, power) and can expand advanced sections (FFT, PSD, estimator flags)
3D flight replay: Interactive replay with playback controls, not just a static 3D trajectory view
Deep linking: Every plot section has a URL hash for sharing specific views
Mobile-responsive: Card-based layout that works on tablets and phones
Dark mode: Because developers love dark mode


Comparison with Alternatives

Foxglove

Foxglove is a commercial robotics visualization platform that supports ULog. It's excellent for interactive exploration but:

Commercial product (free tier has limits)
Not self-hostable (cloud-only for team features)
General-purpose (not PX4-specific key facts and analysis)
No community-driven analysis logic (FFT cutoff markers, vibration thresholds, PID analysis)

Flight Review Next would complement Foxglove: users who want deep PX4-specific analysis use Flight Review; users who want general-purpose exploration can export to Foxglove.
PlotJuggler

Excellent desktop tool but:

Desktop-only (no web sharing)
No persistent storage or team collaboration
No PX4-specific key facts or summary statistics
No automated analysis pipeline

Flight Review Next would serve a different need: cloud-first, shareable, with automated PX4-specific analysis.
Grafana-Based Approach: Full Analysis

A Grafana-based solution was evaluated as an alternative to building a custom frontend. Two variants were considered: (A) using Grafana as-is with existing panels, and (B) building custom Grafana panels for the missing visualization types.
What Grafana provides out of the box


Built-in time-series panels are polished and performant (~60% of Flight Review's plots)
Synchronized crosshairs across all panels work natively (Single/All tooltip modes)
Dashboard JSON model + provisioning API: one JSON template serves all logs via ?var-log_id=XXX
Geomap panel handles GPS tracks with route layers
Annotation system can represent flight mode changes as colored regions
Table panel handles parameter tables and logged messages
Built-in auth with OAuth2, LDAP, SAML, org-based multi-tenancy, role-based permissions
Dashboard sharing, snapshot export, alerting
Battle-tested: 67.5k GitHub stars, 25M+ users

What's missing (would need custom panels)


Visualization
Grafana Status
Custom Panel Effort


FFT with filter cutoff markers
No panel exists
Medium (2-3 weeks). TypeScript panel, FFT data pre-computed server-side


PSD Spectrogram
No panel (heatmap is for histograms)
Medium-Hard (3-4 weeks). WebGL heatmap with freq/time axes


PID step response
Nothing close
Hard (4-6 weeks). Wiener deconvolution results, Bode plots


3D flight trajectory
One limited community plugin
Medium (3-4 weeks). Three.js panel with vehicle replay


Key facts dashboard
Stat panels exist but clunky
Easy (1 week). Custom panel with cards layout


Total custom panel development: ~13-18 weeks (3-4 months) for the missing visualizations.
Option A: Grafana as-is (rejected)

Using only built-in panels means losing FFT, spectrograms, PID analysis, and 3D trajectory -- the features that differentiate Flight Review. Rejected.
Option B: Grafana + Custom Panels (viable alternative)

Build 4-5 custom Grafana panel plugins and use Grafana as the entire visualization layer.
Architecture:
┌──────────────┐     ┌──────────────────┐     ┌─────────────────┐
│  Custom App  │     │     Grafana      │     │   Backend API   │
│  (Browse,    │────►│  (All plotting)  │────►│  (Rust/Axum)    │
│   Upload,    │     │                  │     │                 │
│   Key Facts) │     │  Built-in panels │     │  - ULog parse   │
│              │     │  + Custom panels │     │  - TSDB ingest  │
│  React SPA   │     │  - FFT panel     │     │  - S3 storage   │
└──────────────┘     │  - Spectrogram   │     └────────┬────────┘
                     │  - PID panel     │              │
                     │  - 3D replay     │     ┌────────▼────────┐
                     └────────┬─────────┘     │  TimescaleDB    │
                              └──────────────►│  (time-series)  │
                                              └─────────────────┘

Pros:

Massive head start on time-series. ~25 time-series plots work out of the box. Cursor sync, zoom, pan, legend, annotations -- all free.
Dashboard-as-code. One JSON template serves all logs. No React component tree for the plotting layer.
Auth is solved. Grafana's built-in OAuth2, LDAP, and org-based multi-tenancy cover both the public instance and private deployments.
Familiar to operations teams. Many orgs already run Grafana. Flight log dashboards are a natural extension.
Panel plugin SDK is mature. TypeScript + React, well-documented, hot reload.
Community contribution model. People can contribute Grafana panel plugins without touching the core backend.

Cons:

Requires a TSDB. Grafana queries a datasource, not JSON files. Parsed ULog data must be ingested into TimescaleDB. On-demand ingestion adds 3-10 seconds cold-start per log. Pre-ingesting 350k logs: ~16TB compressed (impractical). For a 15-hour log, on-demand ingest means writing ~173M data points before the dashboard renders.
Two+ services always. Grafana + TSDB + custom app. Kills the "single binary on a Raspberry Pi" deployment tier.
Embedding UX friction. Browse app links to Grafana dashboards. Looks like two different apps. Grafana Cloud does NOT support embedding; only self-hosted OSS.
35+ panels = heavy. Each panel fires independent queries. 120+ queries to TimescaleDB on page load for a 15-hour log.
Plugin maintenance burden. Grafana's plugin API changes between major versions (~2/year). Custom panels need ongoing testing.
No offline/static export. Pre-computed JSON can generate static HTML reports. Grafana requires a live server.

Decision Framework


Factor
Custom Frontend (React + ECharts)
Grafana + Custom Panels


Time to MVP
8-10 weeks (build everything)
6-8 weeks (time-series free, build custom panels + ingest)


Time-series quality
Good (ECharts is solid)
Excellent (Grafana is best-in-class)


FFT/Spectrogram
Build in ECharts (~2 weeks)
Build as Grafana plugin (~3-4 weeks, more boilerplate)


Deployment simplicity
Single binary possible
Always needs Grafana + TSDB (min 3 services)


Small team / Pi / air-gap
Works everywhere
Impractical


Large deployment (Dronecode)
More custom code to maintain
Leverages Grafana's maturity


Auth
Must build
Free


15-hour log handling
Pre-computed JSON, instant load
TSDB ingest of 173M points, cold-start latency


Contributor model
Fork + PR
Separate plugin repos


UX cohesion
Fully cohesive
Two-app feel


Recommendation: Build Both Paths, Share the Backend

Rather than choosing one, the Rust backend API can serve data two ways:

REST/JSON endpoint (GET /api/logs/{id}/plots) → consumed by the custom React frontend (default)
TimescaleDB ingestion (on-demand) → consumed by Grafana's PostgreSQL/TimescaleDB datasource (optional)

The custom React frontend is the default for all deployment tiers. Grafana dashboards are an optional, documented alternative for organizations that already run Grafana. Same backend, same processing pipeline, same data -- just different consumers.
Custom Grafana panels (FFT, spectrogram, PID, 3D) can be developed as community contributions since they are standalone plugins with no coupling to the core app. This is a natural contribution path for organizations already invested in Grafana.
Where Grafana is definitely used: As the monitoring dashboard for Flight Review's own infrastructure (API latency, queue depth, error rates, S3 metrics).

Resource Estimates

Compute (Dronecode production instance)


Component
CPU
RAM
Instances


API service
1 vCPU
512MB
2


Processing worker
2 vCPU
2GB
2-4


PostgreSQL
2 vCPU
4GB
1 (RDS)


Total
8-12 vCPU
9-13GB
-


Comparable to current single-instance (15GB RAM) but with much better utilization.
Storage (350k logs)


Type
Size
Cost/month


Raw ULog files (existing)
~5TB
~$115 (S3)


Processed JSON cache
~100GB
~$2.30 (S3)


Thumbnails
~10GB
~$0.23 (S3)


PostgreSQL
~5GB
~$15 (RDS db.t3.medium)


Total
~5.1TB
~$133/month


Small team deployment

A team with 100 logs needs: 1 container (~512MB RAM), embedded PostgreSQL or SQLite-mode, MinIO or local disk. Runs on a $5/month VPS or a Raspberry Pi.

Open Questions for Community Input


Backwards compatibility: Should the new system maintain URL compatibility with review.px4.io/plot_app/s/... paths? (Recommend: yes, via nginx redirects)
API stability: Should we publish an API spec that third-party tools (QGroundControl, MAVSDK) can depend on? (Recommend: yes, OpenAPI 3.0)
Real-time streaming: Should the 3D replay support live MAVLink streaming in addition to log replay? (mavsim-viewer already supports this pattern via the data source abstraction)
Multi-log comparison: Should the UI support overlaying multiple flights for comparison? (PlotJuggler supports this natively)
Community analysis plugins: Should we provide a plugin marketplace or registry? (Recommend: start with a plugins/ directory in the repo, evolve later)
Retention policy: Should old logs be auto-archived to S3 Glacier after N months? (Recommend: yes, configurable)


Risks and Mitigations


Risk
Impact
Mitigation


Rust ULog parser doesn't match pyulog feature parity
Processing gaps
Use pyulog as reference test suite; validate against 1000+ real logs


ECharts can't handle spectrogram data well
Visual quality
Fallback to custom WebGL renderer for spectrograms


3D replay performance in browser
Poor mobile experience
Make 3D replay opt-in, lazy-loaded


Migration disrupts 350k existing users
Lost links, broken bookmarks
Maintain old URLs via redirects for 1 year


Community doesn't adopt plugin system
Low extensibility
Build all current features as core plugins; system works without external plugins


PostgreSQL is overkill for small deployments
Complex setup
Support embedded SQLite mode via feature flag for single-user deployments


Success Metrics


Page load time: <1 second for log detail page (vs 3-10s today)
Upload-to-viewable: <10 seconds (vs instant but slow viewing today)
Deployment ease: docker compose up for a working instance
Plugin count: 5+ community-contributed plugins within first year
Feature parity: All 35+ current plot types available at launch
Mobile usability: Fully functional on tablet, viewable on phone


Appendix A: Current Flight Review Plot Inventory

All of these must be ported to the new system:


#
Plot
Source Topics
Type


1
2D Position (XY)
vehicle_local_position
Scatter


2
GPS Map
vehicle_gps_position
Map (Leaflet)


3
Altitude
vehicle_gps_position, vehicle_air_data, vehicle_local_position
Time-series


4
Roll Angle
vehicle_attitude, vehicle_attitude_setpoint
Time-series


5
Pitch Angle
vehicle_attitude, vehicle_attitude_setpoint
Time-series


6
Yaw Angle
vehicle_attitude, vehicle_attitude_setpoint
Time-series


7-9
Roll/Pitch/Yaw Rate
vehicle_angular_velocity, vehicle_rates_setpoint
Time-series


10-12
Local Position X/Y/Z
vehicle_local_position, vehicle_local_position_setpoint
Time-series


13
Velocity
vehicle_local_position
Time-series


14-18
Visual Odometry (5)
vehicle_visual_odometry
Time-series


19
Airspeed
airspeed, airspeed_validated
Time-series


20
TECS
tecs_status
Time-series


21
Manual Control
manual_control_setpoint, manual_control_switches
Time-series


22
Actuator Controls
actuator_controls_0, vehicle_thrust_setpoint
Time-series


23-25
FFT (3 types)
Derived from actuator_controls, angular_velocity
Spectrogram


26
Actuator Controls 1
actuator_controls_1
Time-series


27
Motor/Servo Outputs
actuator_motors, actuator_servos
Time-series


28
ESC RPM
esc_status
Time-series


29
Raw Acceleration
sensor_combined
Time-series


30
Vibration Metrics
vehicle_imu_status
Time-series


31-33
PSD Spectrograms (3)
Derived
Spectrogram


34
Raw Gyroscope
sensor_combined
Time-series


35-36
FIFO Accel/Gyro (per IMU)
sensor_accel_fifo, sensor_gyro_fifo
Time-series + Spectrogram


37
Raw Magnetometer
vehicle_magnetometer
Time-series


38
Distance Sensor
distance_sensor
Time-series


39-40
GPS Quality (2)
vehicle_gps_position
Time-series


41
Thrust-Mag Correlation
battery_status, vehicle_magnetometer
Time-series


42
Power
battery_status, system_power
Time-series


43
Temperature
Various (baro, accel, battery, ESC)
Time-series


44
Estimator Flags
estimator_status
Time-series (binary)


45
Failsafe Flags
failsafe_flags
Time-series (binary)


46
CPU & RAM
cpuload
Time-series


47
Sampling Regularity
sensor_combined, estimator_status
Time-series


Plus: Non-default parameters table, logged messages table, hardfault card, corrupt log warning, perf counters, boot console, PID analysis page, 3D trajectory view.
Appendix B: Technology Summary


Component
Choice
Rationale


Backend language
Rust
10x faster ULog parsing, single binary, memory safety


Web framework
Axum
Async, tower middleware, strong ecosystem


Database
PostgreSQL
Concurrent access, JSONB, full-text search, proven at scale


Object storage
S3 API (aws-sdk-s3)
Direct API, no FUSE. MinIO for self-hosted


Frontend framework
React + TypeScript
Largest ecosystem, best for plugin system


Charting
Apache ECharts
WebGL, millions of points, spectrograms, synchronized axes


3D visualization
Three.js
Most mature WebGL library, ported from mavsim-viewer


Maps
Leaflet or Mapbox GL JS
Flight track with mode coloring


Auth
JWT + OIDC
Simple for small, scalable for large


Deployment
Docker Compose (small), K8s/ECS (large)
Single docker compose up to full cloud


ULog parser
Custom Rust (reference: mavsim-viewer C + pyulog)
Native performance, streaming support


Job queue
PostgreSQL LISTEN/NOTIFY (simple) or SQS (scale)
No extra infrastructure for small deployments


CDN
CloudFront
Already in use, serves static assets + pre-signed URLs


FFT
rustfft (backend), custom (frontend)
High-performance spectral analysis


Compression
zstd
Best ratio/speed tradeoff for processed JSON


Appendix C: Review Feedback & Plan Adjustments

This plan was reviewed from three perspectives: an open-source maintainer, a small-team private deployer, and a DevOps engineer running the production instance. Their feedback surfaced critical gaps and led to the adjustments below.
Review 1: Open-Source Maintainer Perspective

Key concerns raised:


Rust vs Python for contributor accessibility. The PX4 ecosystem is primarily C++ and Python. The domain logic in configured_plots.py (1,165 lines of vibration thresholds, FFT cutoff markers, PID heuristics) was written by flight controller engineers who know Python, not Rust. Rewriting this in Rust risks losing contributors who maintain the analysis logic that is Flight Review's actual value.


Scope is unrealistic at 8 months. A ground-up rewrite (new language, new DB, new frontend, new charting, 3D replay, plugin system, auth, migration) is 12-18 months minimum for a small OSS team. The history of open-source v2 rewrites is littered with projects that never shipped.


Plugin system is over-engineered. Flight Review has had very few external contributors adding new plot types. A formal plugin API adds abstraction overhead, versioning, and API stability commitments without demonstrated demand. Clean code structure is sufficient.


Migration risk is understated. QGroundControl's upload endpoint is a hard API contract not addressed in the plan. URL compatibility for existing links is critical. No rollback plan exists.


Incremental migration recommended. Add process-once caching to the current Python app first, then build a new React frontend, then backfill 350k logs. This delivers instant page loads in 1-2 months with near-zero migration risk.


Response and adjustments:


Rust stays as the primary language. This is a deliberate choice by the project stakeholders who want to move away from Python and invest in Rust. The "process once" architecture means the parsing speed advantage still matters for upload processing and bulk migration. More importantly, Rust's type system, memory safety, and single-binary deployment are long-term wins. The PX4 ecosystem is increasingly multilingual (Rust UAVCAN, Rust MAVLink libraries, Auterion's px4-ulog-rs). The analysis domain logic will be ported methodically with test coverage against real logs.


Scope is reduced for v1. The following are cut from the initial release:

Plugin system → Internal module pattern only, no public plugin API
3D flight replay → Phase 2 feature, current Cesium.js 3D view maintained
OIDC authentication → Simple token/password auth only in v1
PID analysis page → Phase 2
Dark mode → Phase 2
Multi-log comparison → Phase 2
Real-time MAVLink streaming → Not in scope


QGroundControl upload API compatibility is mandatory. The upload endpoint must accept the same multipart POST format QGC uses today. Document this as a hard requirement in Phase 1.


Incremental strategy adopted partially. The React frontend can be developed and deployed alongside the old Bokeh frontend during transition. New uploads get processed; old logs get a "legacy view" link until re-processed.


Review 2: Small Team / Private Deployment Perspective

Key concerns raised:


Three containers is too many. PostgreSQL + MinIO + app triples the operational surface for a team with a few hundred logs. Named Docker volumes are not portable.


PostgreSQL is overkill. SQLite with WAL handles the current 350k-log production instance. A team with hundreds of logs will never stress SQLite.


MinIO is unnecessary. For 10GB of logs, local disk with direct file serving is simpler and sufficient. MinIO recommends 4GB RAM minimum.


Auth is harder than shown. OIDC requires registering OAuth apps, stable domains, HTTPS, and debugging opaque token errors. Teams just want a password.


Raspberry Pi / small VPS not viable. PostgreSQL eats 200-400MB idle, MinIO needs 4GB, processing workers need 2GB. A 1GB VPS can't run this.


Air-gapped deployments not addressed. Many commercial/defense drone teams operate without internet. Map tiles, frontend assets, and auth all assume connectivity.


Missing features for private use: Log organization (folders/tags), batch upload, flight comparison, export/reporting, storage quotas, authorization model (who sees what).


Response and adjustments:


Single-container mode is the default deployment. The architecture now explicitly supports three deployment tiers:


Tier
Components
Storage
Database
Auth
Target


Minimal
Single binary
Local disk
Embedded SQLite
Password list
Teams, Pi, VPS


Standard
Docker Compose (2 containers)
Local disk or MinIO
PostgreSQL
Password or OIDC
Growing teams


Production
ECS/K8s (N containers)
AWS S3
RDS PostgreSQL
OIDC
Dronecode scale


SQLite is first-class, not a fallback. The data access layer abstracts both SQLite and PostgreSQL equally. SQLite is the default; PostgreSQL is the documented upgrade path when concurrent write throughput becomes a measured problem.


Local disk storage is the default. STORAGE_BACKEND=local stores ULog files in ./data/logs/ and the app serves them directly. S3 backend is opt-in for cloud deployments. No MinIO required for simple setups.


Simple auth added. auth.provider: "password" with a static list of username:bcrypt pairs in the config file. Zero external dependencies. OIDC is documented as an upgrade, not the starting point.


Bind mounts, not named volumes. Docker Compose uses ./data:/app/data so backup is tar czf backup.tar.gz ./data/.


Air-gap mode added to requirements. All frontend assets bundled in the Docker image. Map tile URL configurable (defaults to OSM, can point to self-hosted tile server). Docker images published as .tar artifacts alongside registry images. Multi-arch builds (amd64 + arm64).


Batch upload and log tagging added to v1 scope. These are essential for real field workflows.


Review 3: DevOps / Production Operations Perspective

Key concerns raised:


Actual S3 data is 617k files / 14.8TB, not 350k / 5TB. The plan's estimates are off by nearly 3x. Bulk migration is ~3-5 days, not 48 hours, and costs ~$1,300+ in S3 transfer.


Cost is 10-40% higher than current setup ($530-650/month vs ~$470/month), not comparable. Stakeholders should know this upfront.


No observability story. Monitoring should be Phase 1, not Phase 5. The plan has zero detail on metrics, alerting, or structured logging.


Job queue needs persistence. PostgreSQL LISTEN/NOTIFY loses messages if no worker is listening. Need a table-backed queue with SELECT ... FOR UPDATE SKIP LOCKED.


No disaster recovery plan. No RTO/RPO targets, no restore testing, no secrets management.


No SSL/TLS mentioned. Currently Let's Encrypt + nginx.


Pre-signed URLs expire. If a user opens a page and comes back 2 hours later, download links are dead.


The Rust ULog parser doesn't exist yet. The entire plan depends on it. Build and validate it first.


Proof-of-concept with 1,000 real logs needed in Phase 2, not Phase 5. Measure actual parse times, failure rates, and processed JSON sizes before committing to full migration.


Response and adjustments:


Data inventory corrected. The plan now uses 617k files / 14.8TB as the baseline. Migration estimates updated to 3-5 days with 10 workers, ~$1,500 in S3 costs.


Cost transparency added. Estimated steady-state cost is $530-650/month, roughly 15-35% higher than current. The trade-off is dramatically better performance, reliability, and operability. The current system's cost will increase anyway as data grows and S3 FUSE becomes more painful.


Observability is Phase 1. Minimum from day one:

Structured JSON logging via tracing + tracing-subscriber
Health endpoint checking PostgreSQL, S3 connectivity, and queue depth
Prometheus metrics: request latency (p50/p95/p99), error rates, queue length, processing time
Alerting on queue backlog >100, error rate >1%, API p99 >5s
CloudWatch Logs integration for ECS deployment


Job queue redesigned. PostgreSQL-backed with a processing_jobs table, SELECT ... FOR UPDATE SKIP LOCKED for reliable dequeue, LISTEN/NOTIFY as wake-up signal only. Dead letter handling for failed jobs. Configurable retry count.


Disaster recovery defined:

RDS: Automated daily snapshots, 35-day retention, point-in-time recovery. Test restore quarterly.
S3: Versioning enabled for raw ULog files. Processed JSON is regenerable.
RTO: 4 hours. RPO: 24 hours.
Secrets in AWS Secrets Manager.


SSL/TLS: ACM certificate on ALB for production. Caddy with automatic HTTPS for Docker Compose deployments.


Pre-signed URLs: Generate fresh on each API call with 1-hour expiry. Do not cache server-side.


ULog parser is the critical path. Phase 1 now explicitly starts with building and validating the Rust ULog parser against a corpus of 1,000+ real logs before any other work begins. This is the go/no-go gate for the project.


Proof-of-concept migration in Phase 2. Process 1,000 representative logs, measure parse times, failure rates, memory usage, and processed JSON sizes. Use results to refine bulk migration plan.


Revised Timeline


Phase
Duration
Deliverable


0: ULog Parser
6-8 weeks
Rust ULog parser validated against 1,000+ real logs. Go/no-go gate.


1: Core Backend
8-10 weeks
Axum API, PostgreSQL, S3 integration, processing pipeline, observability, QGC-compatible upload


2: Frontend MVP
8-10 weeks
React SPA with all 35+ plot types, browse/search, GPS map, batch upload, log tagging


3: Migration
4-6 weeks
Proof-of-concept with 1,000 logs, then bulk migration, dual-running with old system


4: Cutover
2-4 weeks
DNS cutover, URL redirects, monitoring stabilization


5: Phase 2 Features
Ongoing
3D replay, PID analysis, OIDC, flight comparison, dark mode


Total to production: ~8-10 months (vs original 8 months). More realistic, with an explicit go/no-go gate at week 6-8.
Revised Deployment Tiers

┌─────────────────────────────────────────────────────────────────┐
│ MINIMAL: Single binary, SQLite, local disk, password auth       │
│                                                                 │
│   $ ./flight-review-next --data-dir ./data                      │
│                                                                 │
│   Perfect for: Raspberry Pi, laptop, small VPS, air-gapped      │
│   Requirements: 512MB RAM, 1 CPU, Linux/macOS/Windows           │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ STANDARD: Docker Compose, PostgreSQL, local disk or S3          │
│                                                                 │
│   $ docker compose up                                           │
│                                                                 │
│   Perfect for: Teams of 5-50, office server, cloud VPS          │
│   Requirements: 2GB RAM, 2 CPU                                  │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ PRODUCTION: ECS/K8s, RDS, S3, CloudFront, multiple workers      │
│                                                                 │
│   Perfect for: Dronecode (350k+ logs), large organizations      │
│   Requirements: See resource estimates                           │
└─────────────────────────────────────────────────────────────────┘
Library	Bundle Size	Max Points	GPU Accel	Extensibility	Community
uPlot	45KB	10M+	No (Canvas2D)	Low	Small
Apache ECharts	300KB (tree-shakeable)	10M+	Yes (WebGL)	High	Very large
Plotly.js	3.5MB	100K	Limited	Medium	Large
D3	90KB	Varies	Manual	Very high	Very large
Visualization	Grafana Status	Custom Panel Effort
FFT with filter cutoff markers	No panel exists	Medium (2-3 weeks). TypeScript panel, FFT data pre-computed server-side
PSD Spectrogram	No panel (heatmap is for histograms)	Medium-Hard (3-4 weeks). WebGL heatmap with freq/time axes
PID step response	Nothing close	Hard (4-6 weeks). Wiener deconvolution results, Bode plots
3D flight trajectory	One limited community plugin	Medium (3-4 weeks). Three.js panel with vehicle replay
Key facts dashboard	Stat panels exist but clunky	Easy (1 week). Custom panel with cards layout
Factor	Custom Frontend (React + ECharts)	Grafana + Custom Panels
Time to MVP	8-10 weeks (build everything)	6-8 weeks (time-series free, build custom panels + ingest)
Time-series quality	Good (ECharts is solid)	Excellent (Grafana is best-in-class)
FFT/Spectrogram	Build in ECharts (~2 weeks)	Build as Grafana plugin (~3-4 weeks, more boilerplate)
Deployment simplicity	Single binary possible	Always needs Grafana + TSDB (min 3 services)
Small team / Pi / air-gap	Works everywhere	Impractical
Large deployment (Dronecode)	More custom code to maintain	Leverages Grafana's maturity
Auth	Must build	Free
15-hour log handling	Pre-computed JSON, instant load	TSDB ingest of 173M points, cold-start latency
Contributor model	Fork + PR	Separate plugin repos
UX cohesion	Fully cohesive	Two-app feel
Component	CPU	RAM	Instances
API service	1 vCPU	512MB	2
Processing worker	2 vCPU	2GB	2-4
PostgreSQL	2 vCPU	4GB	1 (RDS)
Total	8-12 vCPU	9-13GB	-
Type	Size	Cost/month
Raw ULog files (existing)	~5TB	~$115 (S3)
Processed JSON cache	~100GB	~$2.30 (S3)
Thumbnails	~10GB	~$0.23 (S3)
PostgreSQL	~5GB	~$15 (RDS db.t3.medium)
Total	~5.1TB	~$133/month
Risk	Impact	Mitigation
Rust ULog parser doesn't match pyulog feature parity	Processing gaps	Use pyulog as reference test suite; validate against 1000+ real logs
ECharts can't handle spectrogram data well	Visual quality	Fallback to custom WebGL renderer for spectrograms
3D replay performance in browser	Poor mobile experience	Make 3D replay opt-in, lazy-loaded
Migration disrupts 350k existing users	Lost links, broken bookmarks	Maintain old URLs via redirects for 1 year
Community doesn't adopt plugin system	Low extensibility	Build all current features as core plugins; system works without external plugins
PostgreSQL is overkill for small deployments	Complex setup	Support embedded SQLite mode via feature flag for single-user deployments
#	Plot	Source Topics	Type
1	2D Position (XY)	vehicle_local_position	Scatter
2	GPS Map	vehicle_gps_position	Map (Leaflet)
3	Altitude	vehicle_gps_position, vehicle_air_data, vehicle_local_position	Time-series
4	Roll Angle	vehicle_attitude, vehicle_attitude_setpoint	Time-series
5	Pitch Angle	vehicle_attitude, vehicle_attitude_setpoint	Time-series
6	Yaw Angle	vehicle_attitude, vehicle_attitude_setpoint	Time-series
7-9	Roll/Pitch/Yaw Rate	vehicle_angular_velocity, vehicle_rates_setpoint	Time-series
10-12	Local Position X/Y/Z	vehicle_local_position, vehicle_local_position_setpoint	Time-series
13	Velocity	vehicle_local_position	Time-series
14-18	Visual Odometry (5)	vehicle_visual_odometry	Time-series
19	Airspeed	airspeed, airspeed_validated	Time-series
20	TECS	tecs_status	Time-series
21	Manual Control	manual_control_setpoint, manual_control_switches	Time-series
22	Actuator Controls	actuator_controls_0, vehicle_thrust_setpoint	Time-series
23-25	FFT (3 types)	Derived from actuator_controls, angular_velocity	Spectrogram
26	Actuator Controls 1	actuator_controls_1	Time-series
27	Motor/Servo Outputs	actuator_motors, actuator_servos	Time-series
28	ESC RPM	esc_status	Time-series
29	Raw Acceleration	sensor_combined	Time-series
30	Vibration Metrics	vehicle_imu_status	Time-series
31-33	PSD Spectrograms (3)	Derived	Spectrogram
34	Raw Gyroscope	sensor_combined	Time-series
35-36	FIFO Accel/Gyro (per IMU)	sensor_accel_fifo, sensor_gyro_fifo	Time-series + Spectrogram
37	Raw Magnetometer	vehicle_magnetometer	Time-series
38	Distance Sensor	distance_sensor	Time-series
39-40	GPS Quality (2)	vehicle_gps_position	Time-series
41	Thrust-Mag Correlation	battery_status, vehicle_magnetometer	Time-series
42	Power	battery_status, system_power	Time-series
43	Temperature	Various (baro, accel, battery, ESC)	Time-series
44	Estimator Flags	estimator_status	Time-series (binary)
45	Failsafe Flags	failsafe_flags	Time-series (binary)
46	CPU & RAM	cpuload	Time-series
47	Sampling Regularity	sensor_combined, estimator_status	Time-series
Component	Choice	Rationale
Backend language	Rust	10x faster ULog parsing, single binary, memory safety
Web framework	Axum	Async, tower middleware, strong ecosystem
Database	PostgreSQL	Concurrent access, JSONB, full-text search, proven at scale
Object storage	S3 API (aws-sdk-s3)	Direct API, no FUSE. MinIO for self-hosted
Frontend framework	React + TypeScript	Largest ecosystem, best for plugin system
Charting	Apache ECharts	WebGL, millions of points, spectrograms, synchronized axes
3D visualization	Three.js	Most mature WebGL library, ported from mavsim-viewer
Maps	Leaflet or Mapbox GL JS	Flight track with mode coloring
Auth	JWT + OIDC	Simple for small, scalable for large
Deployment	Docker Compose (small), K8s/ECS (large)	Single `docker compose up` to full cloud
ULog parser	Custom Rust (reference: mavsim-viewer C + pyulog)	Native performance, streaming support
Job queue	PostgreSQL LISTEN/NOTIFY (simple) or SQS (scale)	No extra infrastructure for small deployments
CDN	CloudFront	Already in use, serves static assets + pre-signed URLs
FFT	rustfft (backend), custom (frontend)	High-performance spectral analysis
Compression	zstd	Best ratio/speed tradeoff for processed JSON
Tier	Components	Storage	Database	Auth	Target
Minimal	Single binary	Local disk	Embedded SQLite	Password list	Teams, Pi, VPS
Standard	Docker Compose (2 containers)	Local disk or MinIO	PostgreSQL	Password or OIDC	Growing teams
Production	ECS/K8s (N containers)	AWS S3	RDS PostgreSQL	OIDC	Dronecode scale
Phase	Duration	Deliverable
0: ULog Parser	6-8 weeks	Rust ULog parser validated against 1,000+ real logs. Go/no-go gate.
1: Core Backend	8-10 weeks	Axum API, PostgreSQL, S3 integration, processing pipeline, observability, QGC-compatible upload
2: Frontend MVP	8-10 weeks	React SPA with all 35+ plot types, browse/search, GPS map, batch upload, log tagging
3: Migration	4-6 weeks	Proof-of-concept with 1,000 logs, then bulk migration, dual-running with old system
4: Cutover	2-4 weeks	DNS cutover, URL redirects, monitoring stabilization
5: Phase 2 Features	Ongoing	3D replay, PID analysis, OIDC, flight comparison, dark mode