esafwan/yt_shorts.md

## yt_shorts.md

      
    Raw
  

              yt_shorts.md
            
          
    YouTube Shorts API – Architecture & File Layout (Frappe)

Goal


Avoid slow/blocking HTTP requests
Move heavy work to background jobs
Add safe parallelism
Enable caching + polling
Keep frontend API fast (<100 ms)


Current Problems (Summary)


Heavy network I/O inside HTTP request
Multiple Gemini calls per request
Repeated yt-dlp initialization
No caching
Web workers get blocked → timeouts


Target Architecture

Frontend
  |
  |-- POST /api/method/google_services.places.api.get_shorts
  |        → returns job_id (instant)
  |
  |-- GET  /api/method/google_services.places.api.get_shorts_result
           → polls cached result
           
Background Worker (queue=long)
  |
  |-- ytshorts_job(place_name)
       - Gemini (phrases)
       - yt-dlp search
       - yt-dlp metadata (parallel)
       - Gemini rewrite (batched)
       - cache result


Directory Structure

google_services/
└── google_services/
    ├── places/
    │   └── api.py                # HTTP APIs
    │
    └── ytshorts/
        ├── __init__.py
        ├── jobs.py               # background job entrypoint
        ├── service.py            # main orchestration
        ├── youtube.py            # yt-dlp logic
        ├── gemini.py             # Gemini logic
        └── utils.py              # helpers (cleaning, constants)


File-by-File Breakdown


places/api.py (HTTP Layer – FAST)

# google_services/places/api.py

import frappe

CACHE_TTL = 3600

@frappe.whitelist()
def get_shorts(place_name):
    if not place_name:
        frappe.throw("place_name is required")

    cache_key = f"ytshorts:{place_name}"

    cached = frappe.cache().get_value(cache_key)
    if cached:
        return {
            "status": "ready",
            "result": cached
        }

    job = frappe.enqueue(
        "google_services.ytshorts.jobs.run_ytshorts_job",
        queue="long",
        place_name=place_name,
        job_name=cache_key,
        timeout=900
    )

    return {
        "status": "processing",
        "job_id": job.id
    }


@frappe.whitelist()
def get_shorts_result(place_name):
    cache_key = f"ytshorts:{place_name}"
    result = frappe.cache().get_value(cache_key)

    return {
        "status": "ready" if result else "processing",
        "result": result
    }
Why

No heavy logic in web request
Safe under concurrency
Instant response


ytshorts/jobs.py (Background Job Entry)

# google_services/ytshorts/jobs.py

import frappe
from google_services.ytshorts.service import run_ytshorts

CACHE_TTL = 3600

def run_ytshorts_job(place_name: str):
    cache_key = f"ytshorts:{place_name}"

    try:
        result = run_ytshorts(place_name)

        frappe.cache().set_value(
            cache_key,
            result,
            expires_in_sec=CACHE_TTL
        )

    except Exception:
        frappe.log_error(
            frappe.get_traceback(),
            "YT Shorts Background Job Failed"
        )
        frappe.cache().set_value(cache_key, [])

ytshorts/service.py (Orchestration Layer)

# google_services/ytshorts/service.py

from concurrent.futures import ThreadPoolExecutor, as_completed

from google_services.ytshorts.youtube import (
    search_youtube,
    fetch_metadata,
    YDL_OPTS,
)
from google_services.ytshorts.gemini import (
    generate_search_phrases,
    rewrite_titles_batch,
)
from google_services.ytshorts.utils import clean_title

from yt_dlp import YoutubeDL

MAX_LINKS = 8
MAX_WORKERS = 4

def run_ytshorts(place_name):
    phrases = generate_search_phrases(place_name)

    links = set()
    with YoutubeDL(YDL_OPTS) as ydl:
        for q in phrases:
            links.update(search_youtube(ydl, q))

        links = list(links)[:MAX_LINKS]

        videos = []
        with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
            futures = [
                executor.submit(fetch_metadata, ydl, link)
                for link in links
            ]

            for f in as_completed(futures):
                meta = f.result()
                if meta:
                    videos.append(meta)

    cleaned_titles = [
        clean_title(v["title"])
        for v in videos
        if clean_title(v["title"])
    ]

    summaries = rewrite_titles_batch(cleaned_titles)

    final = []
    for v, s in zip(videos, summaries):
        final.append({
            "title": v["title"],
            "url": v["url"],
            "summary": s
        })

    return final

ytshorts/youtube.py (yt-dlp – Parallel Safe)

# google_services/ytshorts/youtube.py

from yt_dlp import YoutubeDL

YDL_OPTS = {
    "quiet": True,
    "skip_download": True,
    "extract_flat": True,
    "no_warnings": True,
}

def search_youtube(ydl, query: str):
    info = ydl.extract_info(f"ytsearch2:{query}", download=False)
    return [
        e.get("url")
        for e in info.get("entries", [])
        if e.get("url")
    ]


def fetch_metadata(ydl, link: str):
    info = ydl.extract_info(link, download=False)

    if info.get("duration", 0) > 60:
        return None

    return {
        "title": info.get("title"),
        "url": info.get("webpage_url"),
    }

ytshorts/gemini.py (Batch Gemini Calls)

# google_services/ytshorts/gemini.py

import frappe
from google import genai

client = genai.Client(api_key=frappe.conf.get("GEMINI_API_KEY"))
MODEL_NAME = frappe.conf.get("GEMINI_MODEL")

def generate_search_phrases(place_name):
    prompt = f"""
Generate 5 YouTube Shorts search phrases for {place_name}.
Rules:
- 3–5 words
- Include "shorts" or "POV"
- Use one keyword from: walk, POV, street food, food, nightlife, landmark
- No numbering
"""
    res = client.models.generate_content(
        model=MODEL_NAME,
        contents=prompt
    )
    return [l.strip() for l in res.text.splitlines() if l.strip()]


def rewrite_titles_batch(titles):
    joined = "\n".join(f"- {t}" for t in titles)

    prompt = f"""
Rewrite each line into ONE simple English sentence.
Same order. One line per title.

Titles:
{joined}
"""

    res = client.models.generate_content(
        model=MODEL_NAME,
        contents=prompt
    )
    return [l.strip() for l in res.text.splitlines() if l.strip()]

ytshorts/utils.py (Pure Helpers)

# google_services/ytshorts/utils.py

import re

def clean_title(title: str) -> str:
    title = re.sub(r"#\w+", "", title)
    title = re.sub(r"[^\x00-\x7F]+", "", title)
    title = re.sub(r"\s+", " ", title).strip()
    return title

Parallelism: What We Did (and Why)


Area
Strategy


yt-dlp metadata
ThreadPoolExecutor (I/O bound)


Gemini rewrite
Batched (1 call)


Web request
No threads (async job only)


ORM / frappe
❌ never parallelized
Area	Strategy
yt-dlp metadata	ThreadPoolExecutor (I/O bound)
Gemini rewrite	Batched (1 call)
Web request	No threads (async job only)
ORM / frappe	❌ never parallelized
No results found