Skip to content

Instantly share code, notes, and snippets.

@nghyane
Last active March 11, 2026 12:06
Show Gist options
  • Select an option

  • Save nghyane/1058042f3ef99e7c71567aab5aeeed6a to your computer and use it in GitHub Desktop.

Select an option

Save nghyane/1058042f3ef99e7c71567aab5aeeed6a to your computer and use it in GitHub Desktop.
Manga/Webtoon Translator No-Server V1 Spec

Manhwa Translator - No-Server V1 (Self-Use First)

Version: v1.4-selfuse Date: 2026-03-11 Principle: Build for yourself first. When you forget it's your own tool, it's ready for user #2.


1) Reverse Constraints

We design from failure backward. V1 fails if any of these happen:

  1. User cannot get first translated result quickly after one click.
  2. Text frequently overflows bubble and becomes hard to read.
  3. Translation loses conversational context between bubbles.
  4. Product creates ongoing server cost for us.

Everything not directly reducing those failures is removed from V1.


2) V1 Non-Negotiables

  1. No hosted inference backend from us.
  2. One-click translation for all comic images in current page.
  3. Progressive rendering (visible images first).
  4. Bubble text fit with strict overflow guard.
  5. Conversation context passed between bubbles/pages for coherent translation.
  6. Abstract provider interfaces for OCR and Translation (pluggable from day 1).

3) V1 Scope

  • Primary: English manhwa (webtoon) → Vietnamese.
  • Also supported: Japanese manga → Vietnamese (via manga-ocr).
  • Self-use focus: optimize for what you read most.

4) Minimal Architecture

Browser Extension
  ├─ Content Script
  │   ├─ Find reading images
  │   ├─ Overlay translated text (white bg + rendered text)
  │   └─ Track viewport/lazy-load
  └─ Background Service Worker
      ├─ Queue image jobs
      └─ Call localhost app

Local App (127.0.0.1 only)
  ├─ /health
  ├─ /translate-image
  ├─ Text detection    ─── comic-text-detector ONNX (~95 MB)
  ├─ OCR provider      ─── abstract interface
  │   ├─ PP-OCRv5 English (~10 MB)
  │   └─ manga-ocr Japanese (~140 MB)
  ├─ Translation provider ─── abstract interface
  │   ├─ V1: OpenAICompatibleAdapter
  │   └─ V2: DeepL / Google / local LLM / ...
  ├─ Deterministic fit engine
  └─ Disk cache

No cloud from us. User provides endpoint/key if needed.


5) Abstract Provider Interfaces

OCR Provider

OCRProvider.recognize(image: Image, lang: str) -> OCRResult

OCRResult {
  text: str,
  confidence: float
}

V1 adapters:

  • PPOCRAdapter — PP-OCRv5 mobile ONNX for English (~10 MB).
  • MangaOCRAdapter — manga-ocr ONNX for Japanese (~140 MB, l0wgear/manga-ocr-2025-onnx).

Auto-select by detected source language.

Translation Provider

TranslationProvider.translate(req: TranslateRequest) -> TranslateResult

TranslateRequest {
  bubbles: [{ id, source_text }],
  target_lang: str,
  context: [{ source_text, translated_text }]  // previous bubbles
}

TranslateResult {
  bubbles: [{ id, translated_text }]
}

V1 adapters:

  • OpenAICompatibleAdapter — works with OpenAI, Ollama, any compatible endpoint.

Future adapters:

  • DeepLAdapter
  • GoogleTranslateAdapter
  • LocalLLMAdapter (GGUF via llama.cpp)

6) Model Choices (V1)

Model Source Size Purpose
comic-text-detector mayocream/comic-text-detector-onnx ~95 MB Text region detection + polygon mask
PP-OCRv5 mobile (English) PaddlePaddle ~10 MB English OCR
manga-ocr l0wgear/manga-ocr-2025-onnx ~140 MB Japanese OCR (vertical, furigana, multi-line)

Total local model footprint: ~245 MB.


7) Core Flow

  1. User clicks Translate This Page.
  2. Extension queues all detected comic images in DOM order.
  3. Queue priority:
    • visible images,
    • next one screen,
    • remaining images.
  4. Local app processes each image:
    • detect text regions (comic-text-detector ONNX),
    • crop bubble regions,
    • OCR via provider (PP-OCRv5 English or manga-ocr Japanese),
    • batch translate all bubbles in one LLM call (with context),
    • fit translated text into bubble polygon.
  5. Extension renders overlays as each image completes.

8) Local API (Minimal)

Base URL: http://127.0.0.1:4319

GET /health

Returns:

  1. app ready/not ready,
  2. detection model loaded/not loaded,
  3. provider configured/not configured.

POST /translate-image

Request:

{
  "image_id": "sha256",
  "image_blob_b64": "...",
  "target_lang": "vi",
  "ocr_provider": "llm_vision",
  "translation_provider": "openai_compatible",
  "provider_config": {
    "endpoint": "https://api.openai.com/v1",
    "model": "gpt-4.1-mini"
  },
  "context_hint": {
    "chapter_id": "optional",
    "image_index": 12,
    "previous_translations": [
      {
        "image_index": 11,
        "bubbles": [
          { "source_text": "...", "translated_text": "..." }
        ]
      }
    ]
  }
}

Response:

{
  "image_id": "sha256",
  "status": "ok",
  "bubbles": [
    {
      "bubble_id": "...",
      "polygon": [[1, 2], [3, 4]],
      "source_text": "...",
      "translated_text": "...",
      "font_size_px": 17,
      "line_height": 1.2,
      "overflow": false
    }
  ]
}

9) Fit Engine (Keep Deterministic)

  1. Erode bubble mask to safe area (8px default).
  2. Wrap by punctuation-aware tokenization.
  3. Font-size binary search.
  4. Render-bounds check against safe area mask.
  5. If still overflow:
    • run one short rewrite prompt with max-length hint,
    • if still overflow, show compact expandable bubble.

V1 target is robust readability, not typography perfection.


10) Cache (Only Essential)

Key:

hash(image_hash + target_lang + provider.model)

Store:

  1. Local app disk cache for OCR + translation output.
  2. Extension memory cache for rendered payload.

TTL:

  1. Translation: 30 days.

11) Setup (Dev-Friendly)

V1 is self-use, no installer needed:

  1. Clone repo, run local app (cargo run / python main.py).
  2. Load unpacked extension in browser.
  3. Set API key / Ollama endpoint in config file.
  4. Run Test Connection.
  5. Done.

Polish installer only when preparing for user #2.


12) Security Baseline

  1. Local API listens on loopback only.
  2. Extension origin allowlist enforced.
  3. API key stored in config file (self-use); migrate to OS secure store for user #2.
  4. Default mode does not send images to our servers (we have none).

13) V1 KPIs (Lean)

  1. First visible image translated: <= 3s on mainstream laptop.
  2. Bubble overflow rate: <= 5%.
  3. Hard fail page rate: < 1%.
  4. User readability score: >= 4/5 (self QA).

14) Build Plan (~10 Days)

  1. Day 1-3: Local app + comic-text-detector + LLM Vision OCR+translate pipeline end-to-end.
  2. Day 4-6: Fit engine + context passing between bubbles/pages.
  3. Day 7-8: Extension overlay + progressive queue.
  4. Day 9-10: Cache + edge case fixes from real reading sessions.

Deferred to V2 (User #2):

  • DeepL / Google Translate adapters
  • Installer (MSI/DMG)
  • Licensing / device binding
  • Setup wizard
  • OS secure store for API keys

15) Graduation Criteria (Self → User #2)

Ready for user #2 when:

  1. You read 10+ chapters across different series without manual intervention.
  2. Bubble overflow rate is consistently under 5%.
  3. You forget you're using your own tool.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment