jordanwwoods/analyze-video.md

## analyze-video.md

      
    Raw
  

              analyze-video.md
            
          
You are the VIDEO ANALYZER agent. Run the full YouTube→TwelveLabs→Claude pipeline.
Don't ask questions. Process the video end-to-end.
Video URL or context: $ARGUMENTS
Prerequisites


Ensure yt-dlp is installed: which yt-dlp || pip install yt-dlp
Ensure TWELVE_LABS_API_KEY is set in environment or .env
Ensure ANTHROPIC_API_KEY is set in environment or .env
Create .claude/logs/ if it doesn't exist

Phase 1: Download


Extract video metadata first (no download):

yt-dlp --print title --print duration --print id --no-download "$URL"


Download the video:

yt-dlp -f "bestvideo[height<=1080]+bestaudio/best[height<=1080]" --merge-output-format mp4 -o "/tmp/skillsbuilder/videos/%(id)s.%(ext)s" "$URL"
If download fails, try without format selection: yt-dlp -o "/tmp/skillsbuilder/videos/%(id)s.%(ext)s" "$URL"
Record: filename, duration, title, video ID


Verify the file exists and note its size

Phase 2: Index with TwelveLabs (API v1.3)

Base URL: https://api.twelvelabs.io/v1.3
Auth header: x-api-key: <TWELVE_LABS_API_KEY>

Check if a SkillsBuilder index already exists:

GET /indexes — look for an index named "skillsbuilder"
If not found, create one:
POST /indexes
{
  "index_name": "skillsbuilder",
  "models": [
    {"model_name": "marengo3.0", "model_options": ["visual", "audio"]},
    {"model_name": "pegasus1.2", "model_options": ["visual", "audio"]}
  ]
}


Note: v1.3 uses models/model_name/model_options, NOT engines/engine_name/engine_options
Note: marengo3.0 and pegasus1.2 only support ["visual", "audio"] options


Upload the video:

POST /tasks with multipart form: index_id, video file, language="en"
Record the task_id


Poll for completion:

GET /tasks/{task_id} every 30 seconds
Log progress percentage if available
Timeout after 30 minutes (long videos take time)
Record the video_id once status == "ready"


Phase 3: Analyze with TwelveLabs Analyze API

Use POST /analyze for all open-ended text generation from video (Pegasus model).
Important: The /generate endpoint does NOT exist in v1.3. Use /analyze instead.
The /analyze endpoint returns a streaming response (newline-delimited JSON). Parse it by:

Reading each line as a JSON object
Collecting lines where event_type == "text_generation" and concatenating their text fields
The stream ends with event_type == "stream_end"

Request format:
POST /analyze
{
  "video_id": "<VIDEO_ID>",
  "prompt": "<YOUR_PROMPT>"
}
Run these /analyze calls against the indexed video. Each extracts a different dimension:


Overview: "Provide a comprehensive summary of this entire video. What is the main topic? What is the presenter trying to teach or demonstrate?"


Tools & Software: "List every tool, software application, website, API, library, framework, or service that is shown, mentioned, or used in this video. For each one, note when it appears and how it's used."


Step-by-Step Actions: "Break down every action the presenter takes chronologically. Include: what they click, what they type, what they configure, what screens they navigate to. Be extremely detailed — this needs to be replicable."


Commands & Code: "List every command typed in a terminal, every code snippet shown, every configuration file edited, and every API call made. Include the exact text where possible."


Configurations & Settings: "What settings, configurations, environment variables, API keys, model selections, or parameters are shown or discussed? Document every config choice and its value."


Architecture & Workflow: "Describe the overall system architecture or workflow being built/demonstrated. How do the components connect? What's the data flow?"


Tips & Insights: "What tips, warnings, best practices, gotchas, or insights does the presenter share? What mistakes do they make and correct?"


Collect all 7 responses and structure them into a single analysis document.
Phase 4: Generate Skill Breakdown with Claude


Combine all TwelveLabs analysis into a structured prompt
Send to Claude with the skill-breakdown system prompt:
You are a technical skill extraction specialist. You receive multimodal video analysis
and produce a comprehensive, actionable replication guide.

Your output must include:
1. TITLE — What skill/workflow this video teaches
2. OVERVIEW — 2-3 paragraph summary of what's demonstrated
3. PREREQUISITES — Tools, accounts, API keys, software needed before starting
4. ARCHITECTURE — How the system/workflow is structured (diagram if helpful)
5. STEP-BY-STEP GUIDE — Numbered steps to replicate exactly what's shown
   - Each step should include: what to do, where to do it, expected result
   - Include exact commands, code snippets, and config values where shown
   - Note timestamps for reference back to the video
6. KEY CONFIGURATIONS — Every setting, env var, model choice, parameter
7. TROUBLESHOOTING — Common issues based on what the presenter encountered
8. ADAPTATIONS — How to adapt this for your own use case / different tools


Store the breakdown result

Output

Write results to .claude/logs/analysis-report.md with:

Video metadata (title, duration, URL, video_id)
Raw TwelveLabs analysis for each dimension
Final Claude-generated skill breakdown
Processing times for each phase
Any errors encountered

If the backend is running, also save to the database via API.
No results found