nghyane/manga-extension-no-server-v1-spec.md

## manga-extension-no-server-v1-spec.md

      
    Raw
  

              manga-extension-no-server-v1-spec.md
            
          
    Manhwa Translator - No-Server V1 (Self-Use First)

Version: v1.4-selfuse
Date: 2026-03-11
Principle: Build for yourself first. When you forget it's your own tool, it's ready for user #2.

1) Reverse Constraints

We design from failure backward. V1 fails if any of these happen:

User cannot get first translated result quickly after one click.
Text frequently overflows bubble and becomes hard to read.
Translation loses conversational context between bubbles.
Product creates ongoing server cost for us.

Everything not directly reducing those failures is removed from V1.

2) V1 Non-Negotiables


No hosted inference backend from us.
One-click translation for all comic images in current page.
Progressive rendering (visible images first).
Bubble text fit with strict overflow guard.
Conversation context passed between bubbles/pages for coherent translation.
Abstract provider interfaces for OCR and Translation (pluggable from day 1).


3) V1 Scope


Primary: English manhwa (webtoon) → Vietnamese.
Also supported: Japanese manga → Vietnamese (via manga-ocr).
Self-use focus: optimize for what you read most.


4) Minimal Architecture

Browser Extension
  ├─ Content Script
  │   ├─ Find reading images
  │   ├─ Overlay translated text (white bg + rendered text)
  │   └─ Track viewport/lazy-load
  └─ Background Service Worker
      ├─ Queue image jobs
      └─ Call localhost app

Local App (127.0.0.1 only)
  ├─ /health
  ├─ /translate-image
  ├─ Text detection    ─── comic-text-detector ONNX (~95 MB)
  ├─ OCR provider      ─── abstract interface
  │   ├─ PP-OCRv5 English (~10 MB)
  │   └─ manga-ocr Japanese (~140 MB)
  ├─ Translation provider ─── abstract interface
  │   ├─ V1: OpenAICompatibleAdapter
  │   └─ V2: DeepL / Google / local LLM / ...
  ├─ Deterministic fit engine
  └─ Disk cache

No cloud from us. User provides endpoint/key if needed.

5) Abstract Provider Interfaces

OCR Provider

OCRProvider.recognize(image: Image, lang: str) -> OCRResult

OCRResult {
  text: str,
  confidence: float
}

V1 adapters:

PPOCRAdapter — PP-OCRv5 mobile ONNX for English (~10 MB).
MangaOCRAdapter — manga-ocr ONNX for Japanese (~140 MB, l0wgear/manga-ocr-2025-onnx).

Auto-select by detected source language.
Translation Provider

TranslationProvider.translate(req: TranslateRequest) -> TranslateResult

TranslateRequest {
  bubbles: [{ id, source_text }],
  target_lang: str,
  context: [{ source_text, translated_text }]  // previous bubbles
}

TranslateResult {
  bubbles: [{ id, translated_text }]
}

V1 adapters:

OpenAICompatibleAdapter — works with OpenAI, Ollama, any compatible endpoint.

Future adapters:

DeepLAdapter
GoogleTranslateAdapter
LocalLLMAdapter (GGUF via llama.cpp)


6) Model Choices (V1)


Model
Source
Size
Purpose


comic-text-detector
mayocream/comic-text-detector-onnx
~95 MB
Text region detection + polygon mask


PP-OCRv5 mobile (English)
PaddlePaddle
~10 MB
English OCR


manga-ocr
l0wgear/manga-ocr-2025-onnx
~140 MB
Japanese OCR (vertical, furigana, multi-line)


Total local model footprint: ~245 MB.

7) Core Flow


User clicks Translate This Page.
Extension queues all detected comic images in DOM order.
Queue priority:

visible images,
next one screen,
remaining images.


Local app processes each image:

detect text regions (comic-text-detector ONNX),
crop bubble regions,
OCR via provider (PP-OCRv5 English or manga-ocr Japanese),
batch translate all bubbles in one LLM call (with context),
fit translated text into bubble polygon.


Extension renders overlays as each image completes.


8) Local API (Minimal)

Base URL: http://127.0.0.1:4319
GET /health

Returns:

app ready/not ready,
detection model loaded/not loaded,
provider configured/not configured.

POST /translate-image

Request:
{
  "image_id": "sha256",
  "image_blob_b64": "...",
  "target_lang": "vi",
  "ocr_provider": "llm_vision",
  "translation_provider": "openai_compatible",
  "provider_config": {
    "endpoint": "https://api.openai.com/v1",
    "model": "gpt-4.1-mini"
  },
  "context_hint": {
    "chapter_id": "optional",
    "image_index": 12,
    "previous_translations": [
      {
        "image_index": 11,
        "bubbles": [
          { "source_text": "...", "translated_text": "..." }
        ]
      }
    ]
  }
}
Response:
{
  "image_id": "sha256",
  "status": "ok",
  "bubbles": [
    {
      "bubble_id": "...",
      "polygon": [[1, 2], [3, 4]],
      "source_text": "...",
      "translated_text": "...",
      "font_size_px": 17,
      "line_height": 1.2,
      "overflow": false
    }
  ]
}

9) Fit Engine (Keep Deterministic)


Erode bubble mask to safe area (8px default).
Wrap by punctuation-aware tokenization.
Font-size binary search.
Render-bounds check against safe area mask.
If still overflow:

run one short rewrite prompt with max-length hint,
if still overflow, show compact expandable bubble.


V1 target is robust readability, not typography perfection.

10) Cache (Only Essential)

Key:
hash(image_hash + target_lang + provider.model)
Store:

Local app disk cache for OCR + translation output.
Extension memory cache for rendered payload.

TTL:

Translation: 30 days.


11) Setup (Dev-Friendly)

V1 is self-use, no installer needed:

Clone repo, run local app (cargo run / python main.py).
Load unpacked extension in browser.
Set API key / Ollama endpoint in config file.
Run Test Connection.
Done.

Polish installer only when preparing for user #2.

12) Security Baseline


Local API listens on loopback only.
Extension origin allowlist enforced.
API key stored in config file (self-use); migrate to OS secure store for user #2.
Default mode does not send images to our servers (we have none).


13) V1 KPIs (Lean)


First visible image translated: <= 3s on mainstream laptop.
Bubble overflow rate: <= 5%.
Hard fail page rate: < 1%.
User readability score: >= 4/5 (self QA).


14) Build Plan (~10 Days)


Day 1-3: Local app + comic-text-detector + LLM Vision OCR+translate pipeline end-to-end.
Day 4-6: Fit engine + context passing between bubbles/pages.
Day 7-8: Extension overlay + progressive queue.
Day 9-10: Cache + edge case fixes from real reading sessions.

Deferred to V2 (User #2):


DeepL / Google Translate adapters
Installer (MSI/DMG)
Licensing / device binding
Setup wizard
OS secure store for API keys


15) Graduation Criteria (Self → User #2)

Ready for user #2 when:

You read 10+ chapters across different series without manual intervention.
Bubble overflow rate is consistently under 5%.
You forget you're using your own tool.
Model	Source	Size	Purpose
comic-text-detector	mayocream/comic-text-detector-onnx	~95 MB	Text region detection + polygon mask
PP-OCRv5 mobile (English)	PaddlePaddle	~10 MB	English OCR
manga-ocr	l0wgear/manga-ocr-2025-onnx	~140 MB	Japanese OCR (vertical, furigana, multi-line)
No results found