You are the VIDEO ANALYZER agent. Run the full YouTube→TwelveLabs→Claude pipeline. Don't ask questions. Process the video end-to-end.
Video URL or context: $ARGUMENTS
- Ensure yt-dlp is installed:
which yt-dlp || pip install yt-dlp - Ensure TWELVE_LABS_API_KEY is set in environment or .env
- Ensure ANTHROPIC_API_KEY is set in environment or .env
- Create
.claude/logs/if it doesn't exist
- Extract video metadata first (no download):
yt-dlp --print title --print duration --print id --no-download "$URL"
- Download the video:
yt-dlp -f "bestvideo[height<=1080]+bestaudio/best[height<=1080]" --merge-output-format mp4 -o "/tmp/skillsbuilder/videos/%(id)s.%(ext)s" "$URL"- If download fails, try without format selection:
yt-dlp -o "/tmp/skillsbuilder/videos/%(id)s.%(ext)s" "$URL" - Record: filename, duration, title, video ID
- Verify the file exists and note its size
Base URL: https://api.twelvelabs.io/v1.3
Auth header: x-api-key: <TWELVE_LABS_API_KEY>
- Check if a SkillsBuilder index already exists:
GET /indexes— look for an index named "skillsbuilder"- If not found, create one:
POST /indexes { "index_name": "skillsbuilder", "models": [ {"model_name": "marengo3.0", "model_options": ["visual", "audio"]}, {"model_name": "pegasus1.2", "model_options": ["visual", "audio"]} ] } - Note: v1.3 uses
models/model_name/model_options, NOTengines/engine_name/engine_options - Note: marengo3.0 and pegasus1.2 only support
["visual", "audio"]options
- Upload the video:
POST /taskswith multipart form: index_id, video file, language="en"- Record the task_id
- Poll for completion:
GET /tasks/{task_id}every 30 seconds- Log progress percentage if available
- Timeout after 30 minutes (long videos take time)
- Record the video_id once status == "ready"
Use POST /analyze for all open-ended text generation from video (Pegasus model).
Important: The /generate endpoint does NOT exist in v1.3. Use /analyze instead.
The /analyze endpoint returns a streaming response (newline-delimited JSON). Parse it by:
- Reading each line as a JSON object
- Collecting lines where
event_type == "text_generation"and concatenating theirtextfields - The stream ends with
event_type == "stream_end"
Request format:
POST /analyze
{
"video_id": "<VIDEO_ID>",
"prompt": "<YOUR_PROMPT>"
}Run these /analyze calls against the indexed video. Each extracts a different dimension:
-
Overview: "Provide a comprehensive summary of this entire video. What is the main topic? What is the presenter trying to teach or demonstrate?"
-
Tools & Software: "List every tool, software application, website, API, library, framework, or service that is shown, mentioned, or used in this video. For each one, note when it appears and how it's used."
-
Step-by-Step Actions: "Break down every action the presenter takes chronologically. Include: what they click, what they type, what they configure, what screens they navigate to. Be extremely detailed — this needs to be replicable."
-
Commands & Code: "List every command typed in a terminal, every code snippet shown, every configuration file edited, and every API call made. Include the exact text where possible."
-
Configurations & Settings: "What settings, configurations, environment variables, API keys, model selections, or parameters are shown or discussed? Document every config choice and its value."
-
Architecture & Workflow: "Describe the overall system architecture or workflow being built/demonstrated. How do the components connect? What's the data flow?"
-
Tips & Insights: "What tips, warnings, best practices, gotchas, or insights does the presenter share? What mistakes do they make and correct?"
Collect all 7 responses and structure them into a single analysis document.
- Combine all TwelveLabs analysis into a structured prompt
- Send to Claude with the skill-breakdown system prompt:
You are a technical skill extraction specialist. You receive multimodal video analysis and produce a comprehensive, actionable replication guide. Your output must include: 1. TITLE — What skill/workflow this video teaches 2. OVERVIEW — 2-3 paragraph summary of what's demonstrated 3. PREREQUISITES — Tools, accounts, API keys, software needed before starting 4. ARCHITECTURE — How the system/workflow is structured (diagram if helpful) 5. STEP-BY-STEP GUIDE — Numbered steps to replicate exactly what's shown - Each step should include: what to do, where to do it, expected result - Include exact commands, code snippets, and config values where shown - Note timestamps for reference back to the video 6. KEY CONFIGURATIONS — Every setting, env var, model choice, parameter 7. TROUBLESHOOTING — Common issues based on what the presenter encountered 8. ADAPTATIONS — How to adapt this for your own use case / different tools - Store the breakdown result
Write results to .claude/logs/analysis-report.md with:
- Video metadata (title, duration, URL, video_id)
- Raw TwelveLabs analysis for each dimension
- Final Claude-generated skill breakdown
- Processing times for each phase
- Any errors encountered
If the backend is running, also save to the database via API.