Flight Review (logs.px4.io) has served the PX4 community for nearly 10 years since its first commit in October 2016. Built on Tornado/Bokeh/SQLite with an S3 FUSE mount, it processes 350k+ ULog flight logs but suffers from significant infrastructure pain: every page view re-parses the raw ULog file to regenerate all 35+ Bokeh plots, the S3 FUSE mount is fragile, the SQLite database locks under concurrent access, and the Python/Bokeh stack makes the frontend difficult to extend. This document proposes a ground-up replacement designed for performance, extensibility, and operational simplicity at any scale.
- Age: 9.5 years, 664 commits, actively maintained but receiving only reliability fixes
- Stack: Python 3.11, Tornado (async web server), Bokeh 3.8.2 (plotting), SQLite (WAL mode), Bootstrap 5, Leaflet (maps), Cesium.js (3D)
- Infrastructure: Single EC2 instance (15GB RAM, 19GB disk), nginx reverse proxy, S3 FUSE mount at
/data_s3, 2 Tornado worker processes - Database: SQLite at ~219MB storing only scalar metadata (14 fields per log in
LogsGeneratedtable). No time-series data stored. - Processing model: Every page view triggers a full ULog parse (~1-5s for typical logs) and regeneration of all plots. An in-memory LRU cache of parsed ULog objects provides some relief.
- ULog topics loaded: ~45 of 100+ available topics per log file
- Plots generated: 35+ Bokeh plots including time-series, FFT spectrograms, PSD analysis, GPS maps, parameter tables, and system diagnostics
- S3 storage: ~350k ULog files in
s3://px4-flight-review/flight_review/log_files/
- Performance: Re-parsing ULog files on every page view is the #1 bottleneck. A 50MB log takes 1-5 seconds to parse in Python before any plots render.
- S3 FUSE mount: s3fs is fragile, adds latency, creates kernel-level failure modes, and doesn't support concurrent access patterns well.
- SQLite concurrency: Single-writer limitation causes lock contention with 2 worker processes (mitigated by WAL mode but not eliminated).
- Bokeh server-side rendering: Plots are generated server-side in Python, creating CPU-heavy page loads and making the frontend hard to extend.
- Monolithic architecture: Upload, processing, storage, and visualization are tightly coupled in a single process.
- No caching layer: Computed plot data is discarded after each request (aside from in-memory ULog cache).
- Single-instance deployment: No horizontal scaling, no container support, manual deployment via shell script.
- No authentication: The public instance is fully open; no mechanism for private deployments.
- ULog processing pipeline: The analysis logic (35+ plot types, PID analysis, FFT/PSD spectrograms, vibration analysis) represents years of domain expertise
- S3 as primary storage: Object storage for raw logs is the right pattern
- Browse/search/statistics pages: The metadata-driven browse experience is useful
- CloudFront CDN: Recently added for
/dbinfoendpoint, works well - Overview image generation: Static PNG map thumbnails for browse page
- Process once, serve many: Parse and analyze each ULog file exactly once at upload time. Store results. Never re-parse for viewing.
- S3 API, not FUSE: Use the S3 SDK directly for all object storage operations. Pre-signed URLs for client-side downloads.
- Client-side rendering: Ship processed data to the browser; let the client render plots. Server serves JSON, not HTML.
- Plugin-friendly visualization: Make it trivial to add new plot types, key facts, and analysis modules without touching core code.
- Scale-agnostic deployment: Same codebase runs as a single Docker container for a team of 5 or as a distributed service for 350k+ logs.
- Offline-capable processing: The ULog processing engine should work as a standalone CLI tool, not just as part of the web service.
┌─────────────────────┐
│ CloudFront CDN │
│ (static assets + │
│ pre-signed S3) │
└─────────┬───────────┘
│
┌──────────────┐ ┌──────────▼──────────┐
│ Browser │◄──────────────────│ API Gateway / │
│ │ REST + WS │ Reverse Proxy │
│ SPA Client │──────────────────►│ (nginx) │
└──────────────┘ └──────────┬───────────┘
│
┌───────────────────┼───────────────────┐
│ │ │
┌─────────▼────────┐ ┌───────▼────────┐ ┌───────▼────────┐
│ API Service │ │ Upload Worker │ │ Processing │
│ (Rust / Axum) │ │ (async upload │ │ Worker(s) │
│ │ │ + S3 put) │ │ (ULog parse │
│ - Auth │ │ │ │ + analysis) │
│ - Browse/Search │ └───────┬─────────┘ └───────┬────────┘
│ - Serve plot │ │ │
│ data (JSON) │ │ │
│ - Pre-signed │ ┌────▼───────────────────▼────┐
│ S3 URLs │ │ S3 Bucket │
└─────────┬────────┘ │ │
│ │ /raw/{id}.ulg │
│ │ /processed/{id}.json.zst │
│ │ /thumbnails/{id}.png │
┌────▼─────┐ │ /cache/{id}/plots.json │
│PostgreSQL│ └────────────────────────────┘
│ │
│ - Logs │
│ - Metadata│
│ - Summary│
│ stats │
│ - Users │
│ - Tokens │
└──────────┘
Why Rust: ULog parsing is CPU-bound binary processing -- exactly where Rust excels. A Rust backend can parse a 50MB ULog file in ~100-500ms vs 1-5s in Python (10x improvement). The single-binary deployment model eliminates Python dependency management headaches. Axum provides async HTTP with tower middleware for auth, rate limiting, and observability.
Why not other options:
- Python (FastAPI): Would perpetuate the parsing performance problem. Fine for the API layer but not for processing.
- Go: Good alternative, but lacks the zero-cost abstractions that make binary parsing ergonomic. No existing ULog parser ecosystem.
- Node.js: Wrong tool for CPU-bound binary processing.
Key crates:
axum-- HTTP frameworkaws-sdk-s3-- S3 API (native, not FUSE)sqlx-- PostgreSQL async driverserde/serde_json-- serializationtokio-- async runtime- ULog parser: Either port pyulog to Rust, extend
ulog-rs/yule_log, or write a new one using mavsim-viewer's C parser as reference (439 LOC, clean architecture, already handles all message types)
API surface:
POST /api/upload -- Upload ULog file (multipart)
GET /api/logs -- Browse/search/filter logs
GET /api/logs/{id} -- Log metadata + summary stats
GET /api/logs/{id}/plots -- Pre-computed plot data (JSON)
GET /api/logs/{id}/download -- Pre-signed S3 URL for raw .ulg
GET /api/logs/{id}/pid -- PID analysis data
GET /api/logs/{id}/3d -- 3D trajectory data
GET /api/stats -- Aggregate statistics
POST /api/auth/login -- Authentication (optional)
GET /api/health -- Health check
Why PostgreSQL over alternatives:
- Not TimescaleDB/InfluxDB/ClickHouse: We are NOT storing time-series data in the database. The raw ULog stays in S3, and processed plot data goes to S3 as compressed JSON. The database stores only scalar metadata, summary statistics, and user/auth data -- a perfect fit for vanilla PostgreSQL.
- Not SQLite: Concurrent write access from multiple workers, proper connection pooling, full-text search, JSONB for flexible metadata, and proven horizontal scaling path.
- Not DuckDB: Interesting for analytical queries but overkill for metadata storage. Could be used as an embedded engine in the processing worker for on-the-fly analysis, but PostgreSQL covers the primary need.
Schema (simplified):
CREATE TABLE logs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-- Upload metadata
title TEXT,
description TEXT,
original_filename TEXT,
uploaded_at TIMESTAMPTZ DEFAULT NOW(),
source TEXT, -- "webui", "qgc", "cli"
email TEXT,
public BOOLEAN DEFAULT TRUE,
allow_analysis BOOLEAN DEFAULT TRUE,
-- User-provided context
wind_speed SMALLINT,
rating TEXT,
feedback TEXT,
video_url TEXT,
error_labels TEXT,
-- Processing status
status TEXT DEFAULT 'pending', -- pending, processing, ready, failed
processed_at TIMESTAMPTZ,
-- S3 references
s3_raw_key TEXT NOT NULL,
s3_processed_key TEXT,
s3_thumbnail_key TEXT,
-- Access control
token TEXT UNIQUE,
owner_id UUID REFERENCES users(id)
);
CREATE TABLE log_metadata (
log_id UUID PRIMARY KEY REFERENCES logs(id),
-- Extracted from ULog (computed once at processing time)
duration_s INTEGER,
mav_type TEXT,
estimator TEXT,
autostart_id INTEGER,
hardware TEXT,
software_version TEXT,
software_hash TEXT,
vehicle_uuid TEXT,
start_time_utc TIMESTAMPTZ,
-- Error/warning counts
num_errors INTEGER DEFAULT 0,
num_warnings INTEGER DEFAULT 0,
has_hardfault BOOLEAN DEFAULT FALSE,
file_corrupted BOOLEAN DEFAULT FALSE,
-- Flight modes
flight_modes JSONB, -- [{mode: "POSCTL", duration_s: 120}, ...]
-- Summary statistics (previously computed on every page view)
total_distance_m REAL,
max_altitude_diff_m REAL,
avg_speed_ms REAL,
max_speed_ms REAL,
max_speed_horiz_ms REAL,
max_speed_up_ms REAL,
max_speed_down_ms REAL,
max_tilt_deg REAL,
max_rotation_dps REAL,
avg_current_a REAL,
max_current_a REAL,
-- Vibration summary
max_vibe_level REAL,
vibe_status TEXT, -- "good", "warning", "critical"
-- GPS quality summary
avg_satellites REAL,
min_fix_type SMALLINT,
-- Dropout summary
dropout_count INTEGER,
dropout_total_ms INTEGER,
-- Searchable metadata (JSONB for flexibility)
parameters JSONB, -- Non-default parameters
info_messages JSONB, -- Key info messages
-- Full-text search
search_vector TSVECTOR
);
CREATE TABLE vehicles (
uuid TEXT PRIMARY KEY,
name TEXT,
total_flight_time_s BIGINT DEFAULT 0,
latest_log_id UUID REFERENCES logs(id)
);
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email TEXT UNIQUE,
password_hash TEXT,
role TEXT DEFAULT 'user', -- user, admin
created_at TIMESTAMPTZ DEFAULT NOW()
);Upload flow:
- Client requests a pre-signed upload URL from the API
- Client uploads directly to S3 (bypasses backend for large files)
- Client notifies API that upload is complete
- API enqueues processing job
For small deployments: MinIO provides S3-compatible object storage that runs alongside the app in a single Docker Compose setup.
Storage layout:
s3://bucket/
├── raw/
│ └── {log_id}.ulg # Original ULog file (immutable)
├── processed/
│ └── {log_id}.json.zst # All plot data, compressed (~200-500KB)
├── thumbnails/
│ └── {log_id}.png # Overview map image
└── exports/
└── {log_id}.kml # Optional KML export
Processed data format (the key innovation -- compute once, serve forever):
{
"version": 1,
"log_id": "abc-123",
"computed_at": "2026-03-11T22:00:00Z",
"plots": {
"attitude_roll": {
"type": "timeseries",
"title": "Roll Angle",
"unit": "deg",
"series": [
{"label": "Estimated", "timestamps": [...], "values": [...], "color": "#1f77b4"},
{"label": "Setpoint", "timestamps": [...], "values": [...], "color": "#ff7f0e"}
],
"flight_modes": [{"start": 0.0, "end": 12.5, "mode": "MANUAL"}, ...],
"annotations": [{"time": 5.2, "text": "Param change: MC_ROLL_P=6.5"}]
},
"fft_actuator_controls": {
"type": "spectrogram",
"title": "Actuator Controls FFT",
"frequencies": [...],
"magnitudes": [...],
"markers": [{"freq": 80, "label": "MC_DTERM_CUTOFF"}]
},
"gps_track": {
"type": "map",
"coordinates": [[lat, lon, alt], ...],
"flight_mode_segments": [...]
},
"trajectory_3d": {
"type": "trajectory",
"positions": [[x, y, z], ...],
"quaternions": [[w, x, y, z], ...],
"timestamps": [...],
"vehicle_type": "quadrotor"
}
},
"key_facts": {
"vibration": {"status": "good", "max_level": 3.2, "unit": "m/s^2"},
"gps_quality": {"avg_sats": 14, "min_fix": 3},
"battery": {"voltage_start": 16.2, "voltage_end": 14.8, "mah_used": 1200},
"flight_modes": [{"mode": "POSCTL", "duration_s": 120, "pct": 80}],
"errors": [],
"warnings": ["High vibration on IMU 2"]
},
"tables": {
"parameters": [...],
"messages": [...],
"perf_counters": [...]
}
}This pre-computed JSON eliminates the need to ever re-parse the ULog for viewing. At ~200-500KB compressed (vs 2-90MB raw ULog), it's fast to fetch and cheap to store. For 350k logs, total cache size would be ~70-175GB -- trivial for S3.
Why React: Largest ecosystem for data visualization components, strong TypeScript support, and the most contributors will be familiar with it. Vue or Svelte are viable alternatives but React maximizes contributor pool for an open-source project.
Charting: Apache ECharts
After evaluating options:
| Library | Bundle Size | Max Points | GPU Accel | Extensibility | Community |
|---|---|---|---|---|---|
| uPlot | 45KB | 10M+ | No (Canvas2D) | Low | Small |
| Apache ECharts | 300KB (tree-shakeable) | 10M+ | Yes (WebGL) | High | Very large |
| Plotly.js | 3.5MB | 100K | Limited | Medium | Large |
| D3 | 90KB | Varies | Manual | Very high | Very large |
Apache ECharts wins because:
- WebGL renderer handles millions of points smoothly (critical for high-rate IMU/FIFO data)
- Built-in support for linked/synchronized time axes across multiple plots
- Native support for spectrograms, heatmaps, and scatter plots
- Large-array optimization with
samplingandprogressiverendering - Extensive theming and customization
- Tree-shakeable: only import the chart types you use
- Active development with strong community (Apache Foundation)
uPlot is faster and lighter but lacks spectrogram support and has limited extensibility. Plotly.js is too heavy and struggles with >100K points.
3D Flight Replay: Three.js Web Component
Drawing from mavsim-viewer's architecture (which cleanly separates data processing from rendering):
- Port mavsim-viewer's ULog replay engine logic (~470 LOC) to TypeScript
- Use Three.js for 3D rendering (most mature WebGL library)
- Port the vehicle model registry (8 models across 6 types) as glTF assets
- Implement the dead-reckoning interpolation for smooth 60fps playback
- Trail rendering with speed-based coloring (ring buffer, adaptive sampling)
- Chase camera + FPV camera modes
- Expose as a
<flight-replay>web component or React component
The mavsim-viewer C codebase provides exact specifications for:
- Coordinate transforms (NED to rendering frame)
- Quaternion handling and interpolation
- Flight mode transition tracking (up to 256 changes)
- Playback controls (0.25x to 16x speed, seek, loop)
- Trail sampling parameters (1800 points, 16ms interval, 1cm distance threshold)
Borrowing from PlotJuggler's architecture (which supports 20+ plugins across data loaders, transforms, and visualizers), the frontend should support a simple plugin registry:
// Plugin definition
interface FlightReviewPlugin {
id: string;
name: string;
version: string;
// What data this plugin needs from the processed JSON
requiredPlots?: string[];
requiredKeyFacts?: string[];
// Components
panels?: PanelPlugin[]; // Full panel in the plot area
keyFacts?: KeyFactPlugin[]; // Cards in the summary section
transforms?: TransformPlugin[]; // Client-side data transforms
}
interface PanelPlugin {
id: string;
title: string;
category: string; // "attitude", "position", "power", "sensors", "custom"
component: React.ComponentType<{data: PlotData, config: any}>;
// Optional: server-side processing hint
processorId?: string;
}
interface KeyFactPlugin {
id: string;
title: string;
component: React.ComponentType<{facts: KeyFacts}>;
priority: number; // Display order
}
// Registration
registerPlugin({
id: "vibration-analysis",
name: "Vibration Analysis",
panels: [{
id: "vibe-spectrum",
title: "Vibration Spectrum",
category: "sensors",
component: VibrationSpectrumPanel,
}],
keyFacts: [{
id: "vibe-summary",
title: "Vibration Health",
component: VibrationSummaryCard,
priority: 10,
}],
});For the backend processing pipeline, a similar plugin system allows adding new analysis modules:
// Backend processing plugin trait
trait AnalysisPlugin: Send + Sync {
fn id(&self) -> &str;
fn name(&self) -> &str;
/// Which ULog topics this plugin needs
fn required_topics(&self) -> &[&str];
/// Process ULog data and return plot data + key facts
fn process(&self, ulog: &ParsedULog) -> Result<PluginOutput>;
}
struct PluginOutput {
plots: HashMap<String, PlotData>,
key_facts: HashMap<String, serde_json::Value>,
tables: HashMap<String, TableData>,
}Built-in plugins would cover all current flight review functionality (attitude, position, power, sensors, FFT, PID analysis, etc.), and the community could add new ones without modifying core code.
For the public Dronecode instance: Anonymous uploads continue as today, with optional user accounts for managing your own logs.
For private deployments: Simple auth with configurable backends:
# config.yaml
auth:
enabled: true
provider: "local" # local, oidc, ldap
require_login_to_view: true
require_login_to_upload: true
# For OIDC (Google, GitHub, Okta, etc.)
oidc:
issuer: "https://accounts.google.com"
client_id: "..."
client_secret: "..."Implementation: JWT-based session tokens. The users table is optional -- when auth is disabled, the system behaves exactly like today's public instance.
Single-container deployment (small teams):
# docker-compose.yml
services:
flight-review:
image: ghcr.io/px4/flight-review-next:latest
ports:
- "8080:8080"
environment:
DATABASE_URL: "postgres://fr:fr@db/flight_review"
S3_ENDPOINT: "http://minio:9000"
S3_BUCKET: "flight-review"
S3_ACCESS_KEY: "minioadmin"
S3_SECRET_KEY: "minioadmin"
depends_on:
- db
- minio
db:
image: postgres:16-alpine
volumes:
- pgdata:/var/lib/postgresql/data
environment:
POSTGRES_DB: flight_review
POSTGRES_USER: fr
POSTGRES_PASSWORD: fr
minio:
image: minio/minio
command: server /data
volumes:
- s3data:/data
volumes:
pgdata:
s3data:One docker compose up and you have a fully functional private instance. No nginx, no FUSE mounts, no shell scripts.
Production deployment (Dronecode scale):
- Same containers, but PostgreSQL on RDS, S3 on AWS, and multiple API/worker replicas behind an ALB
- Horizontal scaling: add more processing workers for upload bursts
- CloudFront for static assets and pre-signed S3 URLs
- Optional: Redis/SQS for job queue (or use PostgreSQL LISTEN/NOTIFY for simplicity)
Client API S3 Worker
│ │ │ │
│── POST /api/upload ───►│ │ │
│◄── presigned URL ──────│ │ │
│ │ │ │
│── PUT (direct S3) ────────────────────────────►│ │
│ │ │ │
│── POST /api/upload/complete ──────────────────►│ │
│ {s3_key, metadata} │ │ │
│ │── INSERT log ────────►│ (PostgreSQL) │
│ │── enqueue job ───────────────────────────►│
│◄── 202 {log_id} ──────│ │ │
│ │ │ │
│ │ │ ┌───────────┐ │
│ │ │ │ Download │ │
│ │ │◄───│ raw .ulg │ │
│ │ │ │ │ │
│ │ │ │ Parse ULog │ │
│ │ │ │ │ │
│ │ │ │ Run all │ │
│ │ │ │ analysis │ │
│ │ │ │ plugins │ │
│ │ │ │ │ │
│ │ │◄───│ Upload │ │
│ │ │ │ processed │ │
│ │ │ │ JSON + PNG │ │
│ │ │ └───────────┘ │
│ │◄── UPDATE status='ready' ────────────────│
│ │ │ │
- Download raw ULog from S3 (~2-90MB, up to ~2.7GB for extreme cases like 15-hour flights)
- Parse ULog header, definitions, subscriptions (streaming parser for large files)
- Extract metadata: vehicle type, hardware, software, parameters, info messages
- Compute summary statistics: distance, speed, altitude, tilt, current, vibration levels
- Generate time-series plot data: For each of the ~35 plot types, extract the relevant topic data, apply transforms (unit conversion, filtering, FFT), and produce downsampled series at multiple resolution tiers
- Generate spectrogram data: FFT/PSD for actuator controls, angular velocity, angular acceleration
- Generate map data: GPS coordinates with flight mode segments
- Generate 3D trajectory data: Positions, quaternions, timestamps for the flight replay component
- Generate overview thumbnail: Static map image (can use server-side rendering or delegate to a headless browser)
- Compress and upload processed JSON (zstd compression) + thumbnail to S3
- Update database with metadata, summary stats, and status='ready'
Total processing time target: <5 seconds for a typical 10-minute flight log. For very long logs (1h+), <30 seconds. For extreme logs (15h), <2 minutes.
The largest known log in the current dataset is a 15-hour flight. At ~50 KB/s default logging rate, this produces a ~2.7 GB ULog file with ~173 million data points across ~45 topics (13.5M points per 250Hz topic). This has significant implications:
Processing worker requirements:
- The Rust ULog parser MUST use streaming/mmap parsing -- loading 2.7 GB entirely into RAM is unacceptable
- Processing worker memory budget: 4 GB max, enforced via configurable limit
- Configurable max file size (default 5 GB) to reject pathological inputs
- Large logs should be priority-queued separately to avoid blocking the worker for short uploads
- Processing time scales roughly linearly: ~10-20s in Rust for a 15h log (vs 2-5 minutes in Python)
Current system has zero guards for large logs:
- Nginx limit: 100 MB (
client_max_body_size), Tornado buffer: 300 MB -- both would reject even a 1-hour log - No memory guards in pyulog parsing (loads everything into RAM)
- LRU cache of 8 parsed ULog objects has no size-in-bytes awareness -- eight 15h logs would OOM the server
- Downsampling uses naive every-Nth-sample decimation, not LTTB
Important context: The majority of users upload through the web form on the website, not via QGroundControl auto-upload. The upload flow must prioritize the web UI experience:
- Web form upload (primary): Multipart POST directly to the backend API. For files under ~100 MB (the vast majority of uploads), this is simple and fast. For large files (>100 MB), use chunked upload with progress indication.
- QGroundControl auto-upload (secondary): Must maintain API compatibility with the current QGC upload endpoint format.
- Pre-signed S3 upload (for very large files only): For files >500 MB, the API can optionally provide a pre-signed S3 URL for direct upload, bypassing the backend. This is an optimization, not the default path.
The web upload form should show:
- Upload progress bar with speed and ETA
- File validation (is it a valid ULog?) as soon as the header bytes arrive
- Processing status ("Uploading... → Processing... → Ready") with live updates via SSE or polling
- Link to the log page as soon as processing completes
Raw ULog data can have millions of points (e.g., sensor_combined at 250Hz for 15 hours = 13.5M points per axis). The processed JSON should contain intelligently downsampled data at multiple resolution tiers:
- LTTB (Largest Triangle Three Buckets): Preserves visual shape while aggressively reducing point count. The point budget scales with log duration:
- Logs up to 10 minutes: 4,000 points per series
- Longer logs:
min(max(4000, duration_minutes * 35), 30000)points - 15-hour log: ~31,500 points per series (~5 MB per series, ~75 MB total uncompressed, ~12 MB compressed)
- Hierarchical tiers (pre-computed, stored in S3):
- Tier 1 (overview): LTTB-downsampled as above -- used for initial page load
- Tier 2 (medium zoom): 10x the overview point count, capped at 200K per series
- Tier 3 (full resolution): on-demand endpoint that reads the raw ULog for a specific time range
- Full resolution on demand: For zoomed-in views, the client requests a specific time range at full resolution from a secondary endpoint. This avoids storing full-res data (which would be ~400-600 MB compressed for a 15h log) while still supporting deep inspection.
- FFT/PSD data: Store at native resolution (frequency domain is already compact).
- Map coordinates: Downsample to ~1000 points using Ramer-Douglas-Peucker.
- Rust backend with Axum: health check, upload, S3 integration (API-based)
- PostgreSQL schema and migrations
- ULog parser in Rust (port from pyulog/mavsim-viewer reference)
- Basic processing pipeline: parse ULog, extract metadata, store to DB
- Docker Compose setup with MinIO
- CI/CD pipeline
- Port all 35+ plot types from configured_plots.py to Rust analysis plugins
- Implement FFT/PSD analysis (use
rustfftcrate) - Implement summary statistics computation
- Pre-computed JSON generation with LTTB downsampling
- Overview thumbnail generation
- Processing worker with job queue
- React SPA with TypeScript
- Browse/search page with filtering
- Log detail page with all core plots (ECharts)
- Synchronized time axes across plots
- Flight mode background coloring
- GPS map view (Leaflet or Mapbox GL)
- Parameter table, logged messages
- Responsive design
- 3D flight replay component (Three.js, ported from mavsim-viewer)
- PID analysis page
- Plugin system for frontend panels and key facts
- Authentication system (local + OIDC)
- Full-resolution zoom endpoint
- KML export
- Statistics/analytics page
- Dark mode
- Bulk re-process existing 350k logs (parallel workers on AWS)
- Data migration from SQLite to PostgreSQL
- DNS cutover with nginx redirect for old URLs
- Monitoring and alerting setup
- Documentation and contributor guide
The 350k existing logs can be re-processed in parallel. At 5 seconds per log with 10 workers, this takes ~48 hours. The migration can run alongside the old system, with a read-only bridge serving old logs until re-processing completes.
PlotJuggler's architecture (C++/Qt, 20+ plugins, handles millions of points at 60fps) provides several patterns worth adopting:
- Lazy range computation: Don't compute min/max for all data upfront. Cache ranges and invalidate on data change. Critical for responsive zoom/pan.
- Deque-based storage with dirty flags: PlotJuggler uses
std::dequewith lazy range caching. The web equivalent: typed arrays with cached bounds, recomputed only when the visible window changes. - Plugin architecture: PlotJuggler's
DataLoader,DataStreamer,TransformFunction, andStatePublisherinterfaces cleanly separate concerns. OurAnalysisPlugin(backend) andPanelPlugin(frontend) follow the same pattern. - Transform composition: PlotJuggler supports chaining transforms (derivative -> moving average -> outlier removal). ECharts supports client-side transforms, and the backend can pre-compute common ones.
- Group-based organization: PlotJuggler groups related series (e.g., all IMU measurements) with shared visibility controls. The frontend should do the same.
- WASM plugin potential: PlotJuggler is experimenting with WASM plugins. A future version of Flight Review Next could support user-provided WASM analysis modules that run in the browser.
mavsim-viewer's clean C architecture (~5,560 LOC total) provides exact specifications for the 3D replay component:
- Data source abstraction: Polymorphic
data_source_twith vtable. Port directly to TypeScript abstract class withReplayDataSourceand potentialLiveDataSourceimplementations. - Dead-reckoning interpolation: Essential for smooth 60fps playback from 5-10Hz position data. Linear interpolation:
pos = pos_last + vel * dt. - Adaptive trail sampling: Ring buffer of 1800 points, sampled at 16ms intervals with 1cm minimum distance. Prevents memory bloat while maintaining visual fidelity.
- Speed ribbon coloring: Trail colored by speed (blue=slow, green=medium, red=fast). Normalized against running max speed.
- Seek index: Sparse timestamp index (1 entry per second) enables O(log n) seeking in large logs. Build during initial parse.
- Vehicle model registry: 8 models across 6 types with per-model scale and orientation offsets. Ship as glTF assets for the web version.
- Camera modes: Chase (orbit around vehicle) and FPV (vehicle-mounted gimbal). Both transfer directly to Three.js camera controls.
<App>
├── <Header>
│ ├── <SearchBar>
│ └── <UserMenu>
├── <Routes>
│ ├── <BrowsePage>
│ │ ├── <FilterSidebar>
│ │ ├── <LogGrid>
│ │ │ └── <LogCard> (thumbnail, key facts, duration, vehicle type)
│ │ └── <Pagination>
│ ├── <LogDetailPage>
│ │ ├── <KeyFactsBar>
│ │ │ ├── <VibrationCard>
│ │ │ ├── <GPSQualityCard>
│ │ │ ├── <BatteryCard>
│ │ │ ├── <FlightModesCard>
│ │ │ └── <PluginKeyFactCards...>
│ │ ├── <InfoTable>
│ │ ├── <PlotContainer>
│ │ │ ├── <TimeSeriesPlot> (ECharts, synchronized axes)
│ │ │ ├── <SpectrogramPlot> (ECharts heatmap)
│ │ │ ├── <MapPanel> (Leaflet/Mapbox)
│ │ │ ├── <FlightReplay3D> (Three.js)
│ │ │ └── <PluginPanels...>
│ │ ├── <ParameterTable>
│ │ ├── <MessagesTable>
│ │ └── <CollapsibleSections>
│ │ ├── <PerfCounters>
│ │ └── <BootConsole>
│ ├── <PIDAnalysisPage>
│ ├── <StatisticsPage>
│ └── <UploadPage>
└── <Footer>
- Instant page loads: Pre-computed data loads in <500ms vs 3-10 seconds today
- Synchronized cursors: Hover on one plot, see the corresponding time on all plots and the 3D view
- Key facts dashboard: At-a-glance vibration health, GPS quality, battery status, flight modes -- visible immediately without scrolling through 35 plots
- Collapsible plot categories: Users see what they care about first (attitude, position, power) and can expand advanced sections (FFT, PSD, estimator flags)
- 3D flight replay: Interactive replay with playback controls, not just a static 3D trajectory view
- Deep linking: Every plot section has a URL hash for sharing specific views
- Mobile-responsive: Card-based layout that works on tablets and phones
- Dark mode: Because developers love dark mode
Foxglove is a commercial robotics visualization platform that supports ULog. It's excellent for interactive exploration but:
- Commercial product (free tier has limits)
- Not self-hostable (cloud-only for team features)
- General-purpose (not PX4-specific key facts and analysis)
- No community-driven analysis logic (FFT cutoff markers, vibration thresholds, PID analysis)
Flight Review Next would complement Foxglove: users who want deep PX4-specific analysis use Flight Review; users who want general-purpose exploration can export to Foxglove.
Excellent desktop tool but:
- Desktop-only (no web sharing)
- No persistent storage or team collaboration
- No PX4-specific key facts or summary statistics
- No automated analysis pipeline
Flight Review Next would serve a different need: cloud-first, shareable, with automated PX4-specific analysis.
A Grafana-based solution was evaluated as an alternative to building a custom frontend. Two variants were considered: (A) using Grafana as-is with existing panels, and (B) building custom Grafana panels for the missing visualization types.
- Built-in time-series panels are polished and performant (~60% of Flight Review's plots)
- Synchronized crosshairs across all panels work natively (Single/All tooltip modes)
- Dashboard JSON model + provisioning API: one JSON template serves all logs via
?var-log_id=XXX - Geomap panel handles GPS tracks with route layers
- Annotation system can represent flight mode changes as colored regions
- Table panel handles parameter tables and logged messages
- Built-in auth with OAuth2, LDAP, SAML, org-based multi-tenancy, role-based permissions
- Dashboard sharing, snapshot export, alerting
- Battle-tested: 67.5k GitHub stars, 25M+ users
| Visualization | Grafana Status | Custom Panel Effort |
|---|---|---|
| FFT with filter cutoff markers | No panel exists | Medium (2-3 weeks). TypeScript panel, FFT data pre-computed server-side |
| PSD Spectrogram | No panel (heatmap is for histograms) | Medium-Hard (3-4 weeks). WebGL heatmap with freq/time axes |
| PID step response | Nothing close | Hard (4-6 weeks). Wiener deconvolution results, Bode plots |
| 3D flight trajectory | One limited community plugin | Medium (3-4 weeks). Three.js panel with vehicle replay |
| Key facts dashboard | Stat panels exist but clunky | Easy (1 week). Custom panel with cards layout |
Total custom panel development: ~13-18 weeks (3-4 months) for the missing visualizations.
Using only built-in panels means losing FFT, spectrograms, PID analysis, and 3D trajectory -- the features that differentiate Flight Review. Rejected.
Build 4-5 custom Grafana panel plugins and use Grafana as the entire visualization layer.
Architecture:
┌──────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Custom App │ │ Grafana │ │ Backend API │
│ (Browse, │────►│ (All plotting) │────►│ (Rust/Axum) │
│ Upload, │ │ │ │ │
│ Key Facts) │ │ Built-in panels │ │ - ULog parse │
│ │ │ + Custom panels │ │ - TSDB ingest │
│ React SPA │ │ - FFT panel │ │ - S3 storage │
└──────────────┘ │ - Spectrogram │ └────────┬────────┘
│ - PID panel │ │
│ - 3D replay │ ┌────────▼────────┐
└────────┬─────────┘ │ TimescaleDB │
└──────────────►│ (time-series) │
└─────────────────┘
Pros:
- Massive head start on time-series. ~25 time-series plots work out of the box. Cursor sync, zoom, pan, legend, annotations -- all free.
- Dashboard-as-code. One JSON template serves all logs. No React component tree for the plotting layer.
- Auth is solved. Grafana's built-in OAuth2, LDAP, and org-based multi-tenancy cover both the public instance and private deployments.
- Familiar to operations teams. Many orgs already run Grafana. Flight log dashboards are a natural extension.
- Panel plugin SDK is mature. TypeScript + React, well-documented, hot reload.
- Community contribution model. People can contribute Grafana panel plugins without touching the core backend.
Cons:
- Requires a TSDB. Grafana queries a datasource, not JSON files. Parsed ULog data must be ingested into TimescaleDB. On-demand ingestion adds 3-10 seconds cold-start per log. Pre-ingesting 350k logs: ~16TB compressed (impractical). For a 15-hour log, on-demand ingest means writing ~173M data points before the dashboard renders.
- Two+ services always. Grafana + TSDB + custom app. Kills the "single binary on a Raspberry Pi" deployment tier.
- Embedding UX friction. Browse app links to Grafana dashboards. Looks like two different apps. Grafana Cloud does NOT support embedding; only self-hosted OSS.
- 35+ panels = heavy. Each panel fires independent queries. 120+ queries to TimescaleDB on page load for a 15-hour log.
- Plugin maintenance burden. Grafana's plugin API changes between major versions (~2/year). Custom panels need ongoing testing.
- No offline/static export. Pre-computed JSON can generate static HTML reports. Grafana requires a live server.
| Factor | Custom Frontend (React + ECharts) | Grafana + Custom Panels |
|---|---|---|
| Time to MVP | 8-10 weeks (build everything) | 6-8 weeks (time-series free, build custom panels + ingest) |
| Time-series quality | Good (ECharts is solid) | Excellent (Grafana is best-in-class) |
| FFT/Spectrogram | Build in ECharts (~2 weeks) | Build as Grafana plugin (~3-4 weeks, more boilerplate) |
| Deployment simplicity | Single binary possible | Always needs Grafana + TSDB (min 3 services) |
| Small team / Pi / air-gap | Works everywhere | Impractical |
| Large deployment (Dronecode) | More custom code to maintain | Leverages Grafana's maturity |
| Auth | Must build | Free |
| 15-hour log handling | Pre-computed JSON, instant load | TSDB ingest of 173M points, cold-start latency |
| Contributor model | Fork + PR | Separate plugin repos |
| UX cohesion | Fully cohesive | Two-app feel |
Rather than choosing one, the Rust backend API can serve data two ways:
- REST/JSON endpoint (
GET /api/logs/{id}/plots) → consumed by the custom React frontend (default) - TimescaleDB ingestion (on-demand) → consumed by Grafana's PostgreSQL/TimescaleDB datasource (optional)
The custom React frontend is the default for all deployment tiers. Grafana dashboards are an optional, documented alternative for organizations that already run Grafana. Same backend, same processing pipeline, same data -- just different consumers.
Custom Grafana panels (FFT, spectrogram, PID, 3D) can be developed as community contributions since they are standalone plugins with no coupling to the core app. This is a natural contribution path for organizations already invested in Grafana.
Where Grafana is definitely used: As the monitoring dashboard for Flight Review's own infrastructure (API latency, queue depth, error rates, S3 metrics).
| Component | CPU | RAM | Instances |
|---|---|---|---|
| API service | 1 vCPU | 512MB | 2 |
| Processing worker | 2 vCPU | 2GB | 2-4 |
| PostgreSQL | 2 vCPU | 4GB | 1 (RDS) |
| Total | 8-12 vCPU | 9-13GB | - |
Comparable to current single-instance (15GB RAM) but with much better utilization.
| Type | Size | Cost/month |
|---|---|---|
| Raw ULog files (existing) | ~5TB | ~$115 (S3) |
| Processed JSON cache | ~100GB | ~$2.30 (S3) |
| Thumbnails | ~10GB | ~$0.23 (S3) |
| PostgreSQL | ~5GB | ~$15 (RDS db.t3.medium) |
| Total | ~5.1TB | ~$133/month |
A team with 100 logs needs: 1 container (~512MB RAM), embedded PostgreSQL or SQLite-mode, MinIO or local disk. Runs on a $5/month VPS or a Raspberry Pi.
- Backwards compatibility: Should the new system maintain URL compatibility with
review.px4.io/plot_app/s/...paths? (Recommend: yes, via nginx redirects) - API stability: Should we publish an API spec that third-party tools (QGroundControl, MAVSDK) can depend on? (Recommend: yes, OpenAPI 3.0)
- Real-time streaming: Should the 3D replay support live MAVLink streaming in addition to log replay? (mavsim-viewer already supports this pattern via the data source abstraction)
- Multi-log comparison: Should the UI support overlaying multiple flights for comparison? (PlotJuggler supports this natively)
- Community analysis plugins: Should we provide a plugin marketplace or registry? (Recommend: start with a
plugins/directory in the repo, evolve later) - Retention policy: Should old logs be auto-archived to S3 Glacier after N months? (Recommend: yes, configurable)
| Risk | Impact | Mitigation |
|---|---|---|
| Rust ULog parser doesn't match pyulog feature parity | Processing gaps | Use pyulog as reference test suite; validate against 1000+ real logs |
| ECharts can't handle spectrogram data well | Visual quality | Fallback to custom WebGL renderer for spectrograms |
| 3D replay performance in browser | Poor mobile experience | Make 3D replay opt-in, lazy-loaded |
| Migration disrupts 350k existing users | Lost links, broken bookmarks | Maintain old URLs via redirects for 1 year |
| Community doesn't adopt plugin system | Low extensibility | Build all current features as core plugins; system works without external plugins |
| PostgreSQL is overkill for small deployments | Complex setup | Support embedded SQLite mode via feature flag for single-user deployments |
- Page load time: <1 second for log detail page (vs 3-10s today)
- Upload-to-viewable: <10 seconds (vs instant but slow viewing today)
- Deployment ease:
docker compose upfor a working instance - Plugin count: 5+ community-contributed plugins within first year
- Feature parity: All 35+ current plot types available at launch
- Mobile usability: Fully functional on tablet, viewable on phone
All of these must be ported to the new system:
| # | Plot | Source Topics | Type |
|---|---|---|---|
| 1 | 2D Position (XY) | vehicle_local_position | Scatter |
| 2 | GPS Map | vehicle_gps_position | Map (Leaflet) |
| 3 | Altitude | vehicle_gps_position, vehicle_air_data, vehicle_local_position | Time-series |
| 4 | Roll Angle | vehicle_attitude, vehicle_attitude_setpoint | Time-series |
| 5 | Pitch Angle | vehicle_attitude, vehicle_attitude_setpoint | Time-series |
| 6 | Yaw Angle | vehicle_attitude, vehicle_attitude_setpoint | Time-series |
| 7-9 | Roll/Pitch/Yaw Rate | vehicle_angular_velocity, vehicle_rates_setpoint | Time-series |
| 10-12 | Local Position X/Y/Z | vehicle_local_position, vehicle_local_position_setpoint | Time-series |
| 13 | Velocity | vehicle_local_position | Time-series |
| 14-18 | Visual Odometry (5) | vehicle_visual_odometry | Time-series |
| 19 | Airspeed | airspeed, airspeed_validated | Time-series |
| 20 | TECS | tecs_status | Time-series |
| 21 | Manual Control | manual_control_setpoint, manual_control_switches | Time-series |
| 22 | Actuator Controls | actuator_controls_0, vehicle_thrust_setpoint | Time-series |
| 23-25 | FFT (3 types) | Derived from actuator_controls, angular_velocity | Spectrogram |
| 26 | Actuator Controls 1 | actuator_controls_1 | Time-series |
| 27 | Motor/Servo Outputs | actuator_motors, actuator_servos | Time-series |
| 28 | ESC RPM | esc_status | Time-series |
| 29 | Raw Acceleration | sensor_combined | Time-series |
| 30 | Vibration Metrics | vehicle_imu_status | Time-series |
| 31-33 | PSD Spectrograms (3) | Derived | Spectrogram |
| 34 | Raw Gyroscope | sensor_combined | Time-series |
| 35-36 | FIFO Accel/Gyro (per IMU) | sensor_accel_fifo, sensor_gyro_fifo | Time-series + Spectrogram |
| 37 | Raw Magnetometer | vehicle_magnetometer | Time-series |
| 38 | Distance Sensor | distance_sensor | Time-series |
| 39-40 | GPS Quality (2) | vehicle_gps_position | Time-series |
| 41 | Thrust-Mag Correlation | battery_status, vehicle_magnetometer | Time-series |
| 42 | Power | battery_status, system_power | Time-series |
| 43 | Temperature | Various (baro, accel, battery, ESC) | Time-series |
| 44 | Estimator Flags | estimator_status | Time-series (binary) |
| 45 | Failsafe Flags | failsafe_flags | Time-series (binary) |
| 46 | CPU & RAM | cpuload | Time-series |
| 47 | Sampling Regularity | sensor_combined, estimator_status | Time-series |
Plus: Non-default parameters table, logged messages table, hardfault card, corrupt log warning, perf counters, boot console, PID analysis page, 3D trajectory view.
| Component | Choice | Rationale |
|---|---|---|
| Backend language | Rust | 10x faster ULog parsing, single binary, memory safety |
| Web framework | Axum | Async, tower middleware, strong ecosystem |
| Database | PostgreSQL | Concurrent access, JSONB, full-text search, proven at scale |
| Object storage | S3 API (aws-sdk-s3) | Direct API, no FUSE. MinIO for self-hosted |
| Frontend framework | React + TypeScript | Largest ecosystem, best for plugin system |
| Charting | Apache ECharts | WebGL, millions of points, spectrograms, synchronized axes |
| 3D visualization | Three.js | Most mature WebGL library, ported from mavsim-viewer |
| Maps | Leaflet or Mapbox GL JS | Flight track with mode coloring |
| Auth | JWT + OIDC | Simple for small, scalable for large |
| Deployment | Docker Compose (small), K8s/ECS (large) | Single docker compose up to full cloud |
| ULog parser | Custom Rust (reference: mavsim-viewer C + pyulog) | Native performance, streaming support |
| Job queue | PostgreSQL LISTEN/NOTIFY (simple) or SQS (scale) | No extra infrastructure for small deployments |
| CDN | CloudFront | Already in use, serves static assets + pre-signed URLs |
| FFT | rustfft (backend), custom (frontend) | High-performance spectral analysis |
| Compression | zstd | Best ratio/speed tradeoff for processed JSON |
This plan was reviewed from three perspectives: an open-source maintainer, a small-team private deployer, and a DevOps engineer running the production instance. Their feedback surfaced critical gaps and led to the adjustments below.
Key concerns raised:
-
Rust vs Python for contributor accessibility. The PX4 ecosystem is primarily C++ and Python. The domain logic in
configured_plots.py(1,165 lines of vibration thresholds, FFT cutoff markers, PID heuristics) was written by flight controller engineers who know Python, not Rust. Rewriting this in Rust risks losing contributors who maintain the analysis logic that is Flight Review's actual value. -
Scope is unrealistic at 8 months. A ground-up rewrite (new language, new DB, new frontend, new charting, 3D replay, plugin system, auth, migration) is 12-18 months minimum for a small OSS team. The history of open-source v2 rewrites is littered with projects that never shipped.
-
Plugin system is over-engineered. Flight Review has had very few external contributors adding new plot types. A formal plugin API adds abstraction overhead, versioning, and API stability commitments without demonstrated demand. Clean code structure is sufficient.
-
Migration risk is understated. QGroundControl's upload endpoint is a hard API contract not addressed in the plan. URL compatibility for existing links is critical. No rollback plan exists.
-
Incremental migration recommended. Add process-once caching to the current Python app first, then build a new React frontend, then backfill 350k logs. This delivers instant page loads in 1-2 months with near-zero migration risk.
Response and adjustments:
-
Rust stays as the primary language. This is a deliberate choice by the project stakeholders who want to move away from Python and invest in Rust. The "process once" architecture means the parsing speed advantage still matters for upload processing and bulk migration. More importantly, Rust's type system, memory safety, and single-binary deployment are long-term wins. The PX4 ecosystem is increasingly multilingual (Rust UAVCAN, Rust MAVLink libraries, Auterion's px4-ulog-rs). The analysis domain logic will be ported methodically with test coverage against real logs.
-
Scope is reduced for v1. The following are cut from the initial release:
Plugin system→ Internal module pattern only, no public plugin API3D flight replay→ Phase 2 feature, current Cesium.js 3D view maintainedOIDC authentication→ Simple token/password auth only in v1PID analysis page→ Phase 2Dark mode→ Phase 2Multi-log comparison→ Phase 2Real-time MAVLink streaming→ Not in scope
-
QGroundControl upload API compatibility is mandatory. The upload endpoint must accept the same multipart POST format QGC uses today. Document this as a hard requirement in Phase 1.
-
Incremental strategy adopted partially. The React frontend can be developed and deployed alongside the old Bokeh frontend during transition. New uploads get processed; old logs get a "legacy view" link until re-processed.
Key concerns raised:
-
Three containers is too many. PostgreSQL + MinIO + app triples the operational surface for a team with a few hundred logs. Named Docker volumes are not portable.
-
PostgreSQL is overkill. SQLite with WAL handles the current 350k-log production instance. A team with hundreds of logs will never stress SQLite.
-
MinIO is unnecessary. For 10GB of logs, local disk with direct file serving is simpler and sufficient. MinIO recommends 4GB RAM minimum.
-
Auth is harder than shown. OIDC requires registering OAuth apps, stable domains, HTTPS, and debugging opaque token errors. Teams just want a password.
-
Raspberry Pi / small VPS not viable. PostgreSQL eats 200-400MB idle, MinIO needs 4GB, processing workers need 2GB. A 1GB VPS can't run this.
-
Air-gapped deployments not addressed. Many commercial/defense drone teams operate without internet. Map tiles, frontend assets, and auth all assume connectivity.
-
Missing features for private use: Log organization (folders/tags), batch upload, flight comparison, export/reporting, storage quotas, authorization model (who sees what).
Response and adjustments:
-
Single-container mode is the default deployment. The architecture now explicitly supports three deployment tiers:
Tier Components Storage Database Auth Target Minimal Single binary Local disk Embedded SQLite Password list Teams, Pi, VPS Standard Docker Compose (2 containers) Local disk or MinIO PostgreSQL Password or OIDC Growing teams Production ECS/K8s (N containers) AWS S3 RDS PostgreSQL OIDC Dronecode scale -
SQLite is first-class, not a fallback. The data access layer abstracts both SQLite and PostgreSQL equally. SQLite is the default; PostgreSQL is the documented upgrade path when concurrent write throughput becomes a measured problem.
-
Local disk storage is the default.
STORAGE_BACKEND=localstores ULog files in./data/logs/and the app serves them directly. S3 backend is opt-in for cloud deployments. No MinIO required for simple setups. -
Simple auth added.
auth.provider: "password"with a static list of username:bcrypt pairs in the config file. Zero external dependencies. OIDC is documented as an upgrade, not the starting point. -
Bind mounts, not named volumes. Docker Compose uses
./data:/app/dataso backup istar czf backup.tar.gz ./data/. -
Air-gap mode added to requirements. All frontend assets bundled in the Docker image. Map tile URL configurable (defaults to OSM, can point to self-hosted tile server). Docker images published as
.tarartifacts alongside registry images. Multi-arch builds (amd64 + arm64). -
Batch upload and log tagging added to v1 scope. These are essential for real field workflows.
Key concerns raised:
-
Actual S3 data is 617k files / 14.8TB, not 350k / 5TB. The plan's estimates are off by nearly 3x. Bulk migration is ~3-5 days, not 48 hours, and costs ~$1,300+ in S3 transfer.
-
Cost is 10-40% higher than current setup ($530-650/month vs ~$470/month), not comparable. Stakeholders should know this upfront.
-
No observability story. Monitoring should be Phase 1, not Phase 5. The plan has zero detail on metrics, alerting, or structured logging.
-
Job queue needs persistence. PostgreSQL LISTEN/NOTIFY loses messages if no worker is listening. Need a table-backed queue with
SELECT ... FOR UPDATE SKIP LOCKED. -
No disaster recovery plan. No RTO/RPO targets, no restore testing, no secrets management.
-
No SSL/TLS mentioned. Currently Let's Encrypt + nginx.
-
Pre-signed URLs expire. If a user opens a page and comes back 2 hours later, download links are dead.
-
The Rust ULog parser doesn't exist yet. The entire plan depends on it. Build and validate it first.
-
Proof-of-concept with 1,000 real logs needed in Phase 2, not Phase 5. Measure actual parse times, failure rates, and processed JSON sizes before committing to full migration.
Response and adjustments:
-
Data inventory corrected. The plan now uses 617k files / 14.8TB as the baseline. Migration estimates updated to 3-5 days with 10 workers, ~$1,500 in S3 costs.
-
Cost transparency added. Estimated steady-state cost is $530-650/month, roughly 15-35% higher than current. The trade-off is dramatically better performance, reliability, and operability. The current system's cost will increase anyway as data grows and S3 FUSE becomes more painful.
-
Observability is Phase 1. Minimum from day one:
- Structured JSON logging via
tracing+tracing-subscriber - Health endpoint checking PostgreSQL, S3 connectivity, and queue depth
- Prometheus metrics: request latency (p50/p95/p99), error rates, queue length, processing time
- Alerting on queue backlog >100, error rate >1%, API p99 >5s
- CloudWatch Logs integration for ECS deployment
- Structured JSON logging via
-
Job queue redesigned. PostgreSQL-backed with a
processing_jobstable,SELECT ... FOR UPDATE SKIP LOCKEDfor reliable dequeue, LISTEN/NOTIFY as wake-up signal only. Dead letter handling for failed jobs. Configurable retry count. -
Disaster recovery defined:
- RDS: Automated daily snapshots, 35-day retention, point-in-time recovery. Test restore quarterly.
- S3: Versioning enabled for raw ULog files. Processed JSON is regenerable.
- RTO: 4 hours. RPO: 24 hours.
- Secrets in AWS Secrets Manager.
-
SSL/TLS: ACM certificate on ALB for production. Caddy with automatic HTTPS for Docker Compose deployments.
-
Pre-signed URLs: Generate fresh on each API call with 1-hour expiry. Do not cache server-side.
-
ULog parser is the critical path. Phase 1 now explicitly starts with building and validating the Rust ULog parser against a corpus of 1,000+ real logs before any other work begins. This is the go/no-go gate for the project.
-
Proof-of-concept migration in Phase 2. Process 1,000 representative logs, measure parse times, failure rates, memory usage, and processed JSON sizes. Use results to refine bulk migration plan.
| Phase | Duration | Deliverable |
|---|---|---|
| 0: ULog Parser | 6-8 weeks | Rust ULog parser validated against 1,000+ real logs. Go/no-go gate. |
| 1: Core Backend | 8-10 weeks | Axum API, PostgreSQL, S3 integration, processing pipeline, observability, QGC-compatible upload |
| 2: Frontend MVP | 8-10 weeks | React SPA with all 35+ plot types, browse/search, GPS map, batch upload, log tagging |
| 3: Migration | 4-6 weeks | Proof-of-concept with 1,000 logs, then bulk migration, dual-running with old system |
| 4: Cutover | 2-4 weeks | DNS cutover, URL redirects, monitoring stabilization |
| 5: Phase 2 Features | Ongoing | 3D replay, PID analysis, OIDC, flight comparison, dark mode |
Total to production: ~8-10 months (vs original 8 months). More realistic, with an explicit go/no-go gate at week 6-8.
┌─────────────────────────────────────────────────────────────────┐
│ MINIMAL: Single binary, SQLite, local disk, password auth │
│ │
│ $ ./flight-review-next --data-dir ./data │
│ │
│ Perfect for: Raspberry Pi, laptop, small VPS, air-gapped │
│ Requirements: 512MB RAM, 1 CPU, Linux/macOS/Windows │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ STANDARD: Docker Compose, PostgreSQL, local disk or S3 │
│ │
│ $ docker compose up │
│ │
│ Perfect for: Teams of 5-50, office server, cloud VPS │
│ Requirements: 2GB RAM, 2 CPU │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ PRODUCTION: ECS/K8s, RDS, S3, CloudFront, multiple workers │
│ │
│ Perfect for: Dronecode (350k+ logs), large organizations │
│ Requirements: See resource estimates │
└─────────────────────────────────────────────────────────────────┘