mrpollo/flight-review-search-analysis.md

## flight-review-search-analysis.md

      
    Raw
  

              flight-review-search-analysis.md
            
          
    Flight Review /browse Search Hang — Analysis & Recommendations

Root Cause

browse.py line 88-95 fetches all 353k public logs from SQLite into Python memory on every request. When there's no search term, it's fast — it only processes the paginated slice (10-25 rows). When searching, it calls get_columns_from_tuple() on every single row (string formatting, airframe lookups, datetime conversions, HTML generation), then does Python string matching. 353k iterations of expensive Python work per keystroke. The database has zero indexes.

Perspective 1: AWS Principal Engineer

Immediate (no code changes, 30 minutes)

Add SQLite indexes on the server:
sqlite3 /home/ubuntu/logs.sqlite
CREATE INDEX IF NOT EXISTS idx_logs_public_source ON Logs(Public, Source);
CREATE INDEX IF NOT EXISTS idx_logs_date ON Logs(Date DESC);
ANALYZE;
This speeds up the SQL query but does not fix the hang — the bottleneck is Python iterating 353k rows, not SQLite.
Nginx response caching (the real win):
proxy_cache_path /var/cache/nginx/browse levels=1:2 keys_zone=browse_cache:10m max_size=500m inactive=5m;

location /browse_data_retrieval {
    proxy_cache browse_cache;
    proxy_cache_valid 200 2m;
    proxy_cache_key "$request_uri";
    proxy_cache_use_stale error timeout updating http_500 http_502 http_503;
    proxy_cache_lock on;              # Only ONE request hits Tornado, others wait for cache
    proxy_cache_lock_timeout 30s;
    add_header X-Cache-Status $upstream_cache_status;
    proxy_pass http://127.0.0.1:5100;
}
proxy_cache_lock on is the critical directive — when 5 users search for "pixhawk", only one request hits Tornado. Repeat searches within 2 minutes are instant. Zero cost.
Add rate limiting:
limit_req_zone $binary_remote_addr zone=browse_search:10m rate=2r/s;

location /browse_data_retrieval {
    limit_req zone=browse_search burst=5 nodelay;
    # ... cache config above ...
}
Prevents rapid keystrokes from queuing up multiple 353k-row scans.
Short-term bridge (1-2 days, buys months)

Pre-computed browse JSON via cron — same pattern as the existing /dbinfo fix:

Cron job dumps browse data to a static JSON file (adapt from tools/generate_dbinfo_json.py)
Upload to S3, serve via CloudFront
Switch DataTables to serverSide: false — client-side search/pagination handles 353k rows in milliseconds in the browser

Estimated JSON size: ~5-8MB gzipped. The browse data changes only on upload, not on every request.
Long-term replacement architecture

CloudFront
    ├── /api/*  → API Gateway → Lambda (Python) → DynamoDB (metadata)
    │                                             → S3 (log files)
    └── /*      → S3 (static React/Vue SPA)


DynamoDB over RDS: flat table of log metadata, simple key-value access. On-demand pricing at 353k items ≈ $1-3/month.
Lambda over ECS: stateless, short-lived API calls. Effectively free at this traffic level.
Static SPA: client-side plotting (Plotly.js), client-side search/filtering. No server to manage.
Total new cost: ~$5/month vs current EC2 at $50-100/month.


Perspective 2: Dagar Review

Can we fix without code changes? No. Indexes don't help because the bottleneck is Python iterating 353k rows, not SQLite. Nginx caching is a band-aid — first search still hangs and blocks a Tornado worker for 30+ seconds.
If we must change code, smallest diff? Push search into SQL with LIKE + LIMIT/OFFSET. ~30 lines changed. But every line written for this app is a line thrown away.
What NOT to do:

FTS5 — over-engineering for a dying app
Redis/caching layers — nginx cache is sufficient
Refactoring handlers into proper classes
Adding new search features
"Just in case" indexes on columns nobody queries

The actual recommendation: Remove the search box. One line of HTML. The browse page without search works fine with pagination. If someone needs search, point them at the dbinfo JSON. If the team insists on keeping search, the SQL LIKE change is ~30 lines but why invest in throwaway code?
Option 3: signal.alarm(5) to kill searches that take too long. Ugly but prevents hangs from killing the workers. No functional change, just a timeout.

Perspective 3: PX4 Open Source Maintainer

Community impact

Search is not a niche feature:

Developers debugging regressions need to search by vehicle UUID, git SHA, software version (dagar filed issue #121 for this)
Airframe maintainers search by hardware/airframe name
Self-hosters with fleets (issue #257) reported this exact hang causing server-wide denial of service — had to hard-reboot multiple times
CI/testing workflows — a blocked search request ties up Tornado's event loop and affects plot viewing for all users

Existing community history


Issue #257 (2023): "Site-wide denial of service in search box" — self-hoster reported this. Partial fix (5ced4b0) removed per-row os.path.exists() but didn't address the core O(n) Python iteration
Issue #121: dagar requested search by UUID, hardware, software — acknowledged but the performance issue was never fixed
PRs #316, #317 (2025): Added hash/tag search modes, improved what you can search for but not how
No formal RFC or repo exists for a Flight Review replacement. The "eventually replace" sentiment has not materialized

Recommendation

Fix the current app AND plan the replacement. These are not mutually exclusive.
The fix is a focused PR: push search into SQL with LIKE + LIMIT/OFFSET, add CREATE INDEX IF NOT EXISTS to setup_db.py. ~100 lines total. It:

Fixes a bug reported twice by the community (#257, #121)
Benefits every deployment from single-user to 353k logs
Requires zero new dependencies
Self-hosters get indexes automatically via setup_db.py migration
Buys time to plan a proper replacement

For the replacement, preserve:

Upload API (QGroundControl, upload_log.py, CI all depend on it)
/plot_app?log=ID URL scheme (thousands of links in GitHub issues, Discuss, Slack)
Self-hosting with Docker

For the replacement, change:

Separate API from UI (REST backend + static SPA frontend)
Proper database with indexes/FTS (or DynamoDB for serverless)
Storage abstraction (local disk, S3, GCS) configurable per deployment


Summary of Options


Option
Code Change
Time
Risk
Fixes Search


Nginx cache + rate limit
None (infra only)
30 min
Low
Partially (first hit still hangs)


Add SQLite indexes on server
None (DB only)
10 min
Low
No (bottleneck is Python)


Remove search box
1 line HTML
5 min
None
N/A (removes feature)


SQL LIKE + LIMIT/OFFSET
~30-100 lines
1-2 days
Low
Yes


Pre-computed JSON + client-side DataTables
Template change + cron script
1-2 days
Low
Yes


Full replacement (DynamoDB + Lambda + SPA)
Full rewrite
Months
Medium
Yes


Recommended Path


Today: Nginx caching + rate limiting on /browse_data_retrieval (stops the bleeding)
This week: Decide between SQL fix (minimal code, benefits self-hosters) or pre-computed JSON (zero server load, production-only)
This quarter: Start the replacement architecture conversation with the community — there's no RFC yet, and there should be
Option	Code Change	Time	Risk	Fixes Search
Nginx cache + rate limit	None (infra only)	30 min	Low	Partially (first hit still hangs)
Add SQLite indexes on server	None (DB only)	10 min	Low	No (bottleneck is Python)
Remove search box	1 line HTML	5 min	None	N/A (removes feature)
SQL LIKE + LIMIT/OFFSET	~30-100 lines	1-2 days	Low	Yes
Pre-computed JSON + client-side DataTables	Template change + cron script	1-2 days	Low	Yes
Full replacement (DynamoDB + Lambda + SPA)	Full rewrite	Months	Medium	Yes
No results found