browse.py line 88-95 fetches all 353k public logs from SQLite into Python memory on every request. When there's no search term, it's fast — it only processes the paginated slice (10-25 rows). When searching, it calls get_columns_from_tuple() on every single row (string formatting, airframe lookups, datetime conversions, HTML generation), then does Python string matching. 353k iterations of expensive Python work per keystroke. The database has zero indexes.
Add SQLite indexes on the server:
sqlite3 /home/ubuntu/logs.sqlite
CREATE INDEX IF NOT EXISTS idx_logs_public_source ON Logs(Public, Source);
CREATE INDEX IF NOT EXISTS idx_logs_date ON Logs(Date DESC);
ANALYZE;This speeds up the SQL query but does not fix the hang — the bottleneck is Python iterating 353k rows, not SQLite.
Nginx response caching (the real win):
proxy_cache_path /var/cache/nginx/browse levels=1:2 keys_zone=browse_cache:10m max_size=500m inactive=5m;
location /browse_data_retrieval {
proxy_cache browse_cache;
proxy_cache_valid 200 2m;
proxy_cache_key "$request_uri";
proxy_cache_use_stale error timeout updating http_500 http_502 http_503;
proxy_cache_lock on; # Only ONE request hits Tornado, others wait for cache
proxy_cache_lock_timeout 30s;
add_header X-Cache-Status $upstream_cache_status;
proxy_pass http://127.0.0.1:5100;
}proxy_cache_lock on is the critical directive — when 5 users search for "pixhawk", only one request hits Tornado. Repeat searches within 2 minutes are instant. Zero cost.
Add rate limiting:
limit_req_zone $binary_remote_addr zone=browse_search:10m rate=2r/s;
location /browse_data_retrieval {
limit_req zone=browse_search burst=5 nodelay;
# ... cache config above ...
}Prevents rapid keystrokes from queuing up multiple 353k-row scans.
Pre-computed browse JSON via cron — same pattern as the existing /dbinfo fix:
- Cron job dumps browse data to a static JSON file (adapt from
tools/generate_dbinfo_json.py) - Upload to S3, serve via CloudFront
- Switch DataTables to
serverSide: false— client-side search/pagination handles 353k rows in milliseconds in the browser
Estimated JSON size: ~5-8MB gzipped. The browse data changes only on upload, not on every request.
CloudFront
├── /api/* → API Gateway → Lambda (Python) → DynamoDB (metadata)
│ → S3 (log files)
└── /* → S3 (static React/Vue SPA)
- DynamoDB over RDS: flat table of log metadata, simple key-value access. On-demand pricing at 353k items ≈ $1-3/month.
- Lambda over ECS: stateless, short-lived API calls. Effectively free at this traffic level.
- Static SPA: client-side plotting (Plotly.js), client-side search/filtering. No server to manage.
- Total new cost: ~$5/month vs current EC2 at $50-100/month.
Can we fix without code changes? No. Indexes don't help because the bottleneck is Python iterating 353k rows, not SQLite. Nginx caching is a band-aid — first search still hangs and blocks a Tornado worker for 30+ seconds.
If we must change code, smallest diff? Push search into SQL with LIKE + LIMIT/OFFSET. ~30 lines changed. But every line written for this app is a line thrown away.
What NOT to do:
- FTS5 — over-engineering for a dying app
- Redis/caching layers — nginx cache is sufficient
- Refactoring handlers into proper classes
- Adding new search features
- "Just in case" indexes on columns nobody queries
The actual recommendation: Remove the search box. One line of HTML. The browse page without search works fine with pagination. If someone needs search, point them at the dbinfo JSON. If the team insists on keeping search, the SQL LIKE change is ~30 lines but why invest in throwaway code?
Option 3: signal.alarm(5) to kill searches that take too long. Ugly but prevents hangs from killing the workers. No functional change, just a timeout.
Search is not a niche feature:
- Developers debugging regressions need to search by vehicle UUID, git SHA, software version (dagar filed issue #121 for this)
- Airframe maintainers search by hardware/airframe name
- Self-hosters with fleets (issue #257) reported this exact hang causing server-wide denial of service — had to hard-reboot multiple times
- CI/testing workflows — a blocked search request ties up Tornado's event loop and affects plot viewing for all users
- Issue #257 (2023): "Site-wide denial of service in search box" — self-hoster reported this. Partial fix (5ced4b0) removed per-row
os.path.exists()but didn't address the core O(n) Python iteration - Issue #121: dagar requested search by UUID, hardware, software — acknowledged but the performance issue was never fixed
- PRs #316, #317 (2025): Added hash/tag search modes, improved what you can search for but not how
- No formal RFC or repo exists for a Flight Review replacement. The "eventually replace" sentiment has not materialized
Fix the current app AND plan the replacement. These are not mutually exclusive.
The fix is a focused PR: push search into SQL with LIKE + LIMIT/OFFSET, add CREATE INDEX IF NOT EXISTS to setup_db.py. ~100 lines total. It:
- Fixes a bug reported twice by the community (#257, #121)
- Benefits every deployment from single-user to 353k logs
- Requires zero new dependencies
- Self-hosters get indexes automatically via
setup_db.pymigration - Buys time to plan a proper replacement
For the replacement, preserve:
- Upload API (QGroundControl,
upload_log.py, CI all depend on it) /plot_app?log=IDURL scheme (thousands of links in GitHub issues, Discuss, Slack)- Self-hosting with Docker
For the replacement, change:
- Separate API from UI (REST backend + static SPA frontend)
- Proper database with indexes/FTS (or DynamoDB for serverless)
- Storage abstraction (local disk, S3, GCS) configurable per deployment
| Option | Code Change | Time | Risk | Fixes Search |
|---|---|---|---|---|
| Nginx cache + rate limit | None (infra only) | 30 min | Low | Partially (first hit still hangs) |
| Add SQLite indexes on server | None (DB only) | 10 min | Low | No (bottleneck is Python) |
| Remove search box | 1 line HTML | 5 min | None | N/A (removes feature) |
| SQL LIKE + LIMIT/OFFSET | ~30-100 lines | 1-2 days | Low | Yes |
| Pre-computed JSON + client-side DataTables | Template change + cron script | 1-2 days | Low | Yes |
| Full replacement (DynamoDB + Lambda + SPA) | Full rewrite | Months | Medium | Yes |
- Today: Nginx caching + rate limiting on
/browse_data_retrieval(stops the bleeding) - This week: Decide between SQL fix (minimal code, benefits self-hosters) or pre-computed JSON (zero server load, production-only)
- This quarter: Start the replacement architecture conversation with the community — there's no RFC yet, and there should be