Version: v1.4-selfuse
Date: 2026-03-11
Principle: Build for yourself first. When you forget it's your own tool, it's ready for user #2.
We design from failure backward. V1 fails if any of these happen:
- User cannot get first translated result quickly after one click.
- Text frequently overflows bubble and becomes hard to read.
- Translation loses conversational context between bubbles.
- Product creates ongoing server cost for us.
Everything not directly reducing those failures is removed from V1.
- No hosted inference backend from us.
- One-click translation for all comic images in current page.
- Progressive rendering (visible images first).
- Bubble text fit with strict overflow guard.
- Conversation context passed between bubbles/pages for coherent translation.
- Abstract provider interfaces for OCR and Translation (pluggable from day 1).
- Primary: English manhwa (webtoon) → Vietnamese.
- Also supported: Japanese manga → Vietnamese (via manga-ocr).
- Self-use focus: optimize for what you read most.
Browser Extension
├─ Content Script
│ ├─ Find reading images
│ ├─ Overlay translated text (white bg + rendered text)
│ └─ Track viewport/lazy-load
└─ Background Service Worker
├─ Queue image jobs
└─ Call localhost app
Local App (127.0.0.1 only)
├─ /health
├─ /translate-image
├─ Text detection ─── comic-text-detector ONNX (~95 MB)
├─ OCR provider ─── abstract interface
│ ├─ PP-OCRv5 English (~10 MB)
│ └─ manga-ocr Japanese (~140 MB)
├─ Translation provider ─── abstract interface
│ ├─ V1: OpenAICompatibleAdapter
│ └─ V2: DeepL / Google / local LLM / ...
├─ Deterministic fit engine
└─ Disk cache
No cloud from us. User provides endpoint/key if needed.
OCRProvider.recognize(image: Image, lang: str) -> OCRResult
OCRResult {
text: str,
confidence: float
}
V1 adapters:
PPOCRAdapter— PP-OCRv5 mobile ONNX for English (~10 MB).MangaOCRAdapter— manga-ocr ONNX for Japanese (~140 MB, l0wgear/manga-ocr-2025-onnx).
Auto-select by detected source language.
TranslationProvider.translate(req: TranslateRequest) -> TranslateResult
TranslateRequest {
bubbles: [{ id, source_text }],
target_lang: str,
context: [{ source_text, translated_text }] // previous bubbles
}
TranslateResult {
bubbles: [{ id, translated_text }]
}
V1 adapters:
OpenAICompatibleAdapter— works with OpenAI, Ollama, any compatible endpoint.
Future adapters:
DeepLAdapterGoogleTranslateAdapterLocalLLMAdapter(GGUF via llama.cpp)
| Model | Source | Size | Purpose |
|---|---|---|---|
| comic-text-detector | mayocream/comic-text-detector-onnx | ~95 MB | Text region detection + polygon mask |
| PP-OCRv5 mobile (English) | PaddlePaddle | ~10 MB | English OCR |
| manga-ocr | l0wgear/manga-ocr-2025-onnx | ~140 MB | Japanese OCR (vertical, furigana, multi-line) |
Total local model footprint: ~245 MB.
- User clicks
Translate This Page. - Extension queues all detected comic images in DOM order.
- Queue priority:
- visible images,
- next one screen,
- remaining images.
- Local app processes each image:
- detect text regions (comic-text-detector ONNX),
- crop bubble regions,
- OCR via provider (PP-OCRv5 English or manga-ocr Japanese),
- batch translate all bubbles in one LLM call (with context),
- fit translated text into bubble polygon.
- Extension renders overlays as each image completes.
Base URL: http://127.0.0.1:4319
Returns:
- app ready/not ready,
- detection model loaded/not loaded,
- provider configured/not configured.
Request:
{
"image_id": "sha256",
"image_blob_b64": "...",
"target_lang": "vi",
"ocr_provider": "llm_vision",
"translation_provider": "openai_compatible",
"provider_config": {
"endpoint": "https://api.openai.com/v1",
"model": "gpt-4.1-mini"
},
"context_hint": {
"chapter_id": "optional",
"image_index": 12,
"previous_translations": [
{
"image_index": 11,
"bubbles": [
{ "source_text": "...", "translated_text": "..." }
]
}
]
}
}Response:
{
"image_id": "sha256",
"status": "ok",
"bubbles": [
{
"bubble_id": "...",
"polygon": [[1, 2], [3, 4]],
"source_text": "...",
"translated_text": "...",
"font_size_px": 17,
"line_height": 1.2,
"overflow": false
}
]
}- Erode bubble mask to safe area (
8pxdefault). - Wrap by punctuation-aware tokenization.
- Font-size binary search.
- Render-bounds check against safe area mask.
- If still overflow:
- run one short rewrite prompt with max-length hint,
- if still overflow, show compact expandable bubble.
V1 target is robust readability, not typography perfection.
Key:
hash(image_hash + target_lang + provider.model)
Store:
- Local app disk cache for OCR + translation output.
- Extension memory cache for rendered payload.
TTL:
- Translation: 30 days.
V1 is self-use, no installer needed:
- Clone repo, run local app (
cargo run/python main.py). - Load unpacked extension in browser.
- Set API key / Ollama endpoint in config file.
- Run
Test Connection. - Done.
Polish installer only when preparing for user #2.
- Local API listens on loopback only.
- Extension origin allowlist enforced.
- API key stored in config file (self-use); migrate to OS secure store for user #2.
- Default mode does not send images to our servers (we have none).
- First visible image translated:
<= 3son mainstream laptop. - Bubble overflow rate:
<= 5%. - Hard fail page rate:
< 1%. - User readability score:
>= 4/5(self QA).
- Day 1-3: Local app + comic-text-detector + LLM Vision OCR+translate pipeline end-to-end.
- Day 4-6: Fit engine + context passing between bubbles/pages.
- Day 7-8: Extension overlay + progressive queue.
- Day 9-10: Cache + edge case fixes from real reading sessions.
- DeepL / Google Translate adapters
- Installer (MSI/DMG)
- Licensing / device binding
- Setup wizard
- OS secure store for API keys
Ready for user #2 when:
- You read 10+ chapters across different series without manual intervention.
- Bubble overflow rate is consistently under 5%.
- You forget you're using your own tool.