Skip to content

Instantly share code, notes, and snippets.

@numman-ali
Created December 28, 2025 23:08
Show Gist options
  • Select an option

  • Save numman-ali/4e7a654af593fd157c9bdde3e9a5cf38 to your computer and use it in GitHub Desktop.

Select an option

Save numman-ali/4e7a654af593fd157c9bdde3e9a5cf38 to your computer and use it in GitHub Desktop.

Generative AI Model Research Plan

Priority: TOP — This informs schema design, skill prompts, and render pipeline.

Last Updated: 2025-12-28

Status: ✅ REFERENCES VALIDATED (cloud + on-device) — synthesis still pending


✅ Reference Library Re-Validation (Status)

Problem Identified (resolved for references/): Early drafts were created from web searches and aggregator sources and required systematic validation against official vendor documentation.

What exists:

  • 50 cloud model reference docs
  • 3 on-device model compilation docs (55+ models)
  • Synthesis documents in planning/synthesis/* (not yet revalidated)
  • Model ID audit (complete for models already documented; new models may appear over time)

What is now done (2025-12-28):

  • All existing references/** docs have been validated against official vendor documentation (for cloud) or primary upstream sources (HF/GitHub) for on-device.
  • Gaps (models that should exist in the library but do not yet have dedicated docs) are tracked in:
    • references/GAPS.md
    • references/MODEL-INVENTORY.md

Remaining risk: Any earlier inaccuracies may still exist in planning/synthesis/* until those documents are revalidated against the now-canonical reference docs.


Re-Validation Plan

Phase 1: Cloud Models (existing docs) — DONE (2025-12-28)

Each document requires an agent to:

  1. Fetch official vendor documentation
  2. Compare EVERY claim in the reference doc
  3. Verify prompting vocabulary matches official guidance
  4. Verify capabilities (resolution, duration, formats)
  5. Verify pricing (cross-check with MODEL-AUDIT.md)
  6. Verify API parameters and endpoints
  7. Update the reference doc with corrections
  8. Note any new features/capabilities not captured

Phase 2: On-Device Models (3 compilation docs) — DONE (2025-12-28)

Similar validation against HuggingFace model cards and GitHub repos.

Phase 3: Synthesis Documents (7 documents) — TODO

After reference docs are validated, verify synthesis docs reflect corrected information.


Cloud Models in the Reference Library (as of 2025-12-28)

The canonical “what’s covered vs missing” list lives in:

  • references/MODEL-INVENTORY.md
  • references/GAPS.md

The tables below are a convenience snapshot for this plan doc.

Video Generation (15 documents)

Document Model Provider Primary Source Status
references/video/veo-3.md Veo 3.1 Google cloud.google.com/vertex-ai/generative-ai/docs COVERED
references/video/sora-2.md Sora 2 OpenAI platform.openai.com/docs COVERED
references/video/runway-gen4.5.md Gen-4/4.5 Runway docs.dev.runwayml.com COVERED
references/video/kling-2.1.md Kling 2.1 Kuaishou klingai.com/global/dev COVERED
references/video/luma-ray3.md Ray2/Ray3 Luma AI docs.lumalabs.ai COVERED
references/video/hailuo-02.md Hailuo 02 MiniMax platform.minimaxi.com/docs/api-reference/video-generation-intro COVERED
references/video/midjourney-video.md Midjourney Video Midjourney docs.midjourney.com/docs/video COVERED
references/video/seedance-1.5-pro.md Seedance 1.5 Pro / 1.0 family ByteDance (Volcengine Ark) volcengine.com/docs/82379 COVERED
references/video/pika-2.md Pika 2.2 (via fal.ai) Pika fal.ai/models COVERED
references/video/pixverse.md PixVerse (v5.5) PixVerse docs.platform.pixverse.ai COVERED
references/video/haiper-2.x.md Haiper Video 2.x Haiper docs.haiper.ai/api-reference COVERED
references/video/vidu.md Vidu (viduq1 / 2.0 / 1.5) Vidu docs.platform.vidu.com COVERED
references/video/firefly-video.md Firefly Video (Generate Video API) Adobe developer.adobe.com/firefly-services/docs COVERED
references/video/nova-reel.md Nova Reel AWS (Amazon Bedrock) docs.aws.amazon.com/nova/latest/userguide COVERED
references/video/alibaba-wan.md Wan (Wan2.x / Wanx2.1 + VACE editing) Alibaba Cloud (Model Studio / DashScope) alibabacloud.com/help COVERED

Image Generation (17 documents)

Document Model Provider Primary Source Status
references/image/nano-banana-pro.md Nano Banana / Nano Banana Pro Google ai.google.dev/gemini-api/docs/image-generation COVERED
references/image/imagen-4.md Imagen 4 Google ai.google.dev/gemini-api/docs/imagen COVERED
references/image/flux-2.md FLUX.2 Black Forest Labs docs.bfl.ai COVERED
references/image/flux-kontext.md FLUX.1 Kontext Black Forest Labs docs.bfl.ai/kontext COVERED
references/image/gpt-image.md GPT Image 1.5 OpenAI platform.openai.com/docs/guides/image-generation COVERED
references/image/midjourney.md Midjourney V7 Midjourney docs.midjourney.com COVERED
references/image/ideogram-3.md Ideogram 3.0 Ideogram developer.ideogram.ai COVERED
references/image/seedream-4.md Seedream 4.5 ByteDance docs.byteplus.com COVERED
references/image/firefly-image.md Firefly Image (API) Adobe developer.adobe.com/firefly-services/docs COVERED
references/image/stability-image.md Stable Image + SD 3.5 (API) Stability AI api.stability.ai/v2alpha/openapi COVERED
references/image/nova-canvas.md Nova Canvas AWS (Amazon Bedrock) docs.aws.amazon.com/nova/latest/userguide COVERED
references/image/minimax-image.md MiniMax Image Generation (image-01, image-01-live) MiniMax platform.minimaxi.com/docs/api-reference/image-generation-intro COVERED
references/image/recraft.md Recraft (Recraft API) Recraft recraft.ai/docs/api-reference COVERED
references/image/leonardo.md Leonardo (Image API) Leonardo AI docs.leonardo.ai/reference COVERED
references/image/reve-image.md Reve Image API (Create/Edit/Remix) Reve api.reve.com COVERED
references/image/krea.md Krea (Image/Video API) Krea docs.krea.ai/api-reference COVERED
references/image/freepik-mystic.md Freepik Mystic Freepik docs.freepik.com/api-reference COVERED

Audio Generation (18 documents)

Document Model Provider Primary Source Status
references/audio/elevenlabs.md ElevenLabs TTS ElevenLabs elevenlabs.io/docs COVERED
references/audio/eleven-music.md Eleven Music ElevenLabs elevenlabs.io/docs COVERED
references/audio/minimax-music.md MiniMax Music 2.0 (music-2.0) MiniMax platform.minimaxi.com/docs/api-reference/music-intro COVERED
references/audio/suno-v5.md Suno v5 Suno help.suno.com COVERED
references/audio/udio.md Udio v1.5 Udio help.udio.com COVERED
references/audio/openai-tts.md OpenAI TTS OpenAI platform.openai.com/docs/guides/text-to-speech COVERED
references/audio/fish-audio-openaudio-s1.md OpenAudio S1 Fish Audio docs.fish.audio COVERED
references/audio/cartesia-sonic.md Sonic 3 Cartesia docs.cartesia.ai COVERED
references/audio/playht.md PlayHT PlayHT docs.play.ht COVERED
references/audio/gemini-tts.md Gemini Preview TTS Google (Gemini API) ai.google.dev/gemini-api/docs/speech-generation COVERED
references/audio/minimax-speech.md MiniMax Speech (T2A + Async + Voice Design/Cloning) MiniMax platform.minimaxi.com/docs/api-reference/speech-t2a-intro COVERED
references/audio/google-cloud-tts.md Google Cloud TTS Google Cloud cloud.google.com/text-to-speech COVERED
references/audio/azure-tts.md Azure TTS Microsoft learn.microsoft.com/azure/ai-services/speech-service COVERED
references/audio/amazon-polly.md Amazon Polly AWS docs.aws.amazon.com/polly COVERED
references/audio/respeecher.md Respeecher Respeecher docs.respeecher.com COVERED
references/audio/stable-audio.md Stable Audio 2 / 2.5 Stability AI api.stability.ai/v2alpha/openapi COVERED
references/audio/lyria-2.md Lyria 2 Google docs.cloud.google.com/vertex-ai/generative-ai/docs COVERED
references/audio/lyria-realtime.md Lyria RealTime Google (Gemini API) ai.google.dev/gemini-api/docs/music-generation COVERED

On-Device Models to Validate

Document Models Primary Sources Status
references/video/on-device-models.md compilation doc HuggingFace model cards, GitHub COVERED
references/image/on-device-models.md compilation doc HuggingFace model cards, GitHub COVERED
references/audio/on-device-models.md compilation doc HuggingFace model cards COVERED

Agent Prompts for Validation

Template: Cloud Model Validation Agent

**Task**: Validate the reference document for [MODEL] against official [VENDOR] documentation.

**Reference Document**: `references/[category]/[file].md`
**Primary Source**: [OFFICIAL_DOCS_URL]
**Secondary Sources**: [AGGREGATOR_URLS]

**Validation Checklist**:

1. **Model Identity**
   - [ ] Correct model name/version
   - [ ] Correct API model_id (cross-check MODEL-AUDIT.md)
   - [ ] Correct provider attribution

2. **Capabilities**
   - [ ] Resolution limits verified
   - [ ] Duration limits verified
   - [ ] Supported formats verified
   - [ ] Feature claims verified (audio support, text rendering, etc.)

3. **Pricing**
   - [ ] Current pricing verified
   - [ ] Pricing tiers/variants verified
   - [ ] Credit system (if applicable) verified

4. **API Documentation**
   - [ ] Endpoint format verified
   - [ ] Authentication method verified
   - [ ] Required parameters verified
   - [ ] Optional parameters verified
   - [ ] Response format verified

5. **Prompting Guide**
   - [ ] Camera movement vocabulary verified (video)
   - [ ] Style/aesthetic terminology verified (image)
   - [ ] Voice/emotion controls verified (audio)
   - [ ] Best practices match official guidance
   - [ ] Example prompts verified

6. **Limitations**
   - [ ] Known limitations documented
   - [ ] Rate limits documented
   - [ ] Content restrictions documented

**Output**:
- List of CONFIRMED items (with evidence links)
- List of CORRECTIONS needed (with correct information and evidence)
- List of ADDITIONS (new features/capabilities not in current doc)
- Updated reference document content

**Quality Bar**:
- Every claim must have evidence from official source
- No "seems" or "probably" - use UNKNOWN if unverifiable
- Preserve document structure, only update content

Specific Agent Prompts

Video: Veo 3.1

Validate `references/video/veo-3.md` against:
- https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/veo-video-generation
- https://cloud.google.com/vertex-ai/generative-ai/docs/models/veo/3-1-generate
- https://ai.google.dev/gemini-api/docs

Focus areas:
- Timestamp prompting format (is [00:00-00:03] correct?)
- Audio generation capabilities
- Camera movement vocabulary (what terms does Google recommend?)
- Resolution/duration limits
- Pricing per second

Video: Sora 2

Validate `references/video/sora-2.md` against:
- https://platform.openai.com/docs/models/sora-2
- https://platform.openai.com/docs/models/sora-2-pro
- https://platform.openai.com/docs/api-reference/videos

Focus areas:
- Multi-scene capabilities
- Duration limits (sora-2 vs sora-2-pro)
- Resolution options
- Prompt structure recommendations
- Credit/pricing system

Video: Runway Gen-4

Validate `references/video/runway-gen4.5.md` against:
- https://docs.dev.runwayml.com/guides/models/
- https://docs.dev.runwayml.com/guides/pricing/

Focus areas:
- Gen-4 vs Gen-4.5 availability (Gen-4.5 API not yet available per audit)
- Motion Brush documentation
- Camera control parameters
- Credit system

Video: Kling 2.1

Validate `references/video/kling-2.1.md` against:
- https://klingai.com/global/dev
- https://app.klingai.com/global/dev/document-api

Focus areas:
- Model tiers (standard/pro/master)
- Lip-sync capabilities
- Camera movement vocabulary
- Duration limits per tier
- Pricing structure

Video: Luma Ray

Validate `references/video/luma-ray3.md` against:
- https://docs.lumalabs.ai/docs/api
- https://lumalabs.ai/learning-hub

Focus areas:
- Ray2 vs Ray3 availability (Ray3 API not yet available per audit)
- HDR capabilities
- Draft mode documentation
- Credit system

Video: Hailuo 02

Validate `references/video/hailuo-02.md` against:
- https://platform.minimaxi.com/docs/api-reference/video-generation-intro

Focus areas:
- Model variants (02 vs 2.3 vs 2.3-Fast)
- Resolution/duration options
- Pricing per resolution tier

Image: Nano Banana / Nano Banana Pro (Gemini native image generation)

Validate `references/image/nano-banana-pro.md` against:
- https://ai.google.dev/gemini-api/docs/nanobanana
- https://ai.google.dev/gemini-api/docs/image-generation
- https://ai.google.dev/gemini-api/docs/pricing

Focus areas:
- Correct model IDs (`gemini-2.5-flash-image`, `gemini-3-pro-image-preview`)
- Token/pricing tables and image-size token costs
- 4K output + “Thinking” + thought signatures behavior (Pro)
- Prompting vocabulary + official prompt templates

Image: Imagen 4

Validate `references/image/imagen-4.md` against:
- https://ai.google.dev/gemini-api/docs/imagen
- https://cloud.google.com/vertex-ai/generative-ai/docs/models/imagen/4-0-generate
- https://cloud.google.com/vertex-ai/generative-ai/pricing

Focus areas:
- Model variants (fast/standard/ultra) and IDs
- Pricing (Gemini API vs Vertex AI pricing surfaces)
- Aspect ratio + output size constraints
- Prompting guidance (official)

Image: FLUX.2

Validate `references/image/flux-2.md` against:
- https://docs.bfl.ai/quick_start/generating_images
- https://docs.bfl.ai/flux_2/flux2_overview
- https://bfl.ai/pricing

Focus areas:
- All FLUX.2 variants (pro/max/flex/dev)
- Endpoint-based API (not model_id based)
- Text rendering capabilities
- Pricing per megapixel

Image: GPT Image 1.5

Validate `references/image/gpt-image.md` against:
- https://platform.openai.com/docs/models/gpt-image-1.5
- https://platform.openai.com/docs/guides/image-generation

Focus areas:
- Model versions (1.5 vs 1 vs 1-mini)
- Token-based pricing
- Quality tiers
- Text rendering accuracy

Image: Midjourney V7

Validate `references/image/midjourney.md` against:
- https://docs.midjourney.com

Focus areas:
- V7 capabilities
- API availability (still no public API?)
- Parameter syntax (--ar, --stylize, etc.)
- Style reference system

Image: Ideogram 3.0

Validate `references/image/ideogram-3.md` against:
- https://developer.ideogram.ai/api-reference/api-reference/generate-v3
- https://ideogram.ai/features/3.0

Focus areas:
- Version 3.0 features
- Text rendering accuracy claims
- Style Codes feature
- API endpoint format

Image: Seedream 4.5

Validate `references/image/seedream-4.md` against:
- https://docs.byteplus.com/en/docs/ModelArk
- https://seed.bytedance.com/en/seedream4_5

Focus areas:
- API availability (via BytePlus ModelArk)
- Multi-reference fusion capabilities
- Speed benchmarks
- Pricing

Audio: ElevenLabs

Validate `references/audio/elevenlabs.md` against:
- https://elevenlabs.io/docs/overview/models
- https://elevenlabs.io/docs/api-reference

Focus areas:
- Model IDs (eleven_v3, eleven_multilingual_v2, etc. - use underscores!)
- Voice cloning requirements
- Stability/similarity controls
- Pricing per character

Audio: Suno v5

Validate `references/audio/suno-v5.md` against:
- https://help.suno.com
- https://suno.com

Focus areas:
- v5 capabilities vs v4
- NO official API (only third-party wrappers)
- Song duration limits
- Lyric formatting

Audio: Udio v1.5

Validate `references/audio/udio.md` against:
- https://help.udio.com
- https://www.udio.com/blog

Focus areas:
- v1.5 and v1.5 Allegro differences
- NO official API (Udio explicitly states this)
- Stem separation features
- Key control

Audio: OpenAI TTS

Validate `references/audio/openai-tts.md` against:
- https://platform.openai.com/docs/guides/text-to-speech
- https://platform.openai.com/docs/api-reference/audio

Focus areas:
- Model IDs (tts-1, tts-1-hd, gpt-4o-mini-tts)
- Voice options
- Instructions support (gpt-4o-mini-tts only)
- Pricing structure

Audio: Fish Audio S1

Validate `references/audio/fish-audio-openaudio-s1.md` against:
- https://docs.fish.audio/api-reference/endpoint/openapi-v1/text-to-speech
- https://docs.fish.audio/developer-guide/models-pricing

Focus areas:
- Model ID is just "s1" in API
- Pricing per UTF-8 bytes
- Emotion control capabilities
- Voice cloning

Audio: Cartesia Sonic

Validate `references/audio/cartesia-sonic.md` against:
- https://docs.cartesia.ai/build-with-cartesia/tts-models
- https://cartesia.ai/pricing

Focus areas:
- Sonic-3 vs Sonic-2 vs Sonic-turbo
- Date-stamped version snapshots
- State Space Models claims
- Latency benchmarks

On-Device Model Validation (55 Agents - 1 Per Model)

On-Device Agent Template

**Task**: Validate on-device model [MODEL] against HuggingFace/GitHub.

**Sources**:
- HuggingFace model card: [HF_URL]
- GitHub repo: [GITHUB_URL]

**MANDATORY Validation Checklist**:

1. **Hardware Requirements**
   - [ ] Minimum VRAM verified
   - [ ] Recommended VRAM verified
   - [ ] RAM requirements verified

2. **Mac Compatibility** (CRITICAL - user uses MacBook)
   - [ ] MPS (Metal) support: YES/NO/PARTIAL
   - [ ] Apple Silicon (M1/M2/M3/M4) tested: YES/NO/UNKNOWN
   - [ ] Mac-specific installation steps documented
   - [ ] Mac performance benchmarks if available
   - [ ] Known Mac limitations or issues

3. **License**
   - [ ] License type verified
   - [ ] Commercial use allowed: YES/NO/CONDITIONAL
   - [ ] Revenue limits (if any)

4. **Model Specs**
   - [ ] Parameter count verified
   - [ ] Current version/release date
   - [ ] Output specs (resolution, duration, quality)

5. **Quality Claims**
   - [ ] Benchmark scores verified with source
   - [ ] Comparison claims verified

**Output**: Corrections + Mac compatibility assessment

Video On-Device (13 agents)

# Model HuggingFace/GitHub Focus
1 HunyuanVideo 1.5 tencent/HunyuanVideo-1.5 GGUF options, VRAM, SSTA claims
2 Wan2.1/2.2 Wan-AI/Wan2.1-T2V-14B, Wan-AI/Wan2.2-TI2V-5B MoE architecture, Apache 2.0
3 LTX-Video Lightricks/LTX-Video MPS support, speed claims
4 CogVideoX THUDM/CogVideoX-5b, THUDM/CogVideoX-2b Quantization, Mac support
5 Mochi 1 genmo/mochi-1-preview VRAM requirements, ComfyUI
6 Stable Video Diffusion stabilityai/stable-video-diffusion-img2vid-xt License, optimizations
7 Open-Sora 2.0 hpcaitech/Open-Sora VRAM, output specs
8 Open-Sora Plan PKU-YuanGroup/Open-Sora-Plan v1.5 capabilities
9 AnimateDiff guoyww/AnimateDiff VRAM by config, SDXL support
10 SkyReels V1 SkyworkAI/SkyReels-V1 Human-centric features, VBench
11 Pyramid Flow rain1011/pyramid-flow-sd3 MIT license, Mac support
12 Kandinsky 5.0 kandinskylab/Kandinsky-5.0-T2V-Lite 10s video, attention engines
13 Step-Video stepfun-ai/Step-Video-T2V 30B params, multi-GPU

Image On-Device (18 agents)

# Model HuggingFace Focus
14 SD 1.5 runwayml/stable-diffusion-v1-5 License, ecosystem
15 SDXL stabilityai/stable-diffusion-xl-base-1.0 License terms, refiner
16 SDXL Turbo stabilityai/sdxl-turbo Steps, resolution limits
17 SDXL Lightning ByteDance 2-8 step quality
18 SD 3.5 Medium stabilityai/stable-diffusion-3.5-medium License (<$1M), VRAM
19 SD 3.5 Large stabilityai/stable-diffusion-3.5-large Quantization options
20 FLUX.1 Schnell black-forest-labs/FLUX.1-schnell Apache 2.0, NF4 options
21 FLUX.1 Dev black-forest-labs/FLUX.1-dev Non-commercial terms
22 FLUX.2 Dev black-forest-labs/FLUX.2-dev 32B params, consumer viability
23 Stable Cascade stabilityai/stable-cascade 3-stage architecture
24 PixArt-Sigma PixArt-alpha/PixArt-Sigma-XL-2-1024-MS DiT architecture, 4K
25 HiDream-I1 HiDream.ai 17B params, GGUF variants
26 Z-Image Turbo Tongyi-MAI/Z-Image-Turbo #1 leaderboard, bilingual
27 Kolors Kwai-Kolors/Kolors Commercial registration
28 Playground v2.5 playgroundai/playground-v2.5-1024px-aesthetic Open vs v3 closed
29 HunyuanDiT Tencent OpenVINO, Chinese
30 DeepFloyd IF DeepFloyd/IF-I-XL-v1.0 Text rendering, VRAM
31 Kandinsky 5.0 Lite kandinskylab/kandinsky-5.0-image-lite Multi-modal family

Audio TTS On-Device (17 agents)

# Model HuggingFace/GitHub Focus
32 Chatterbox ResembleAI/chatterbox MIT, emotion control, 63.8% pref
33 Fish Speech/OpenAudio S1 fishaudio/fish-speech CC-BY-NC, #1 TTS-Arena
34 CosyVoice2 FunAudioLLM/CosyVoice2-0.5B Apache 2.0, streaming
35 Kokoro-82M hexgrad/Kokoro-82M Apache 2.0, 82M params
36 F5-TTS SWivid/F5-TTS CC-BY-NC weights
37 IndexTTS-2 index-tts/index-tts Duration control
38 XTTS v2 coqui/XTTS-v2 Coqui license, 17 langs
39 StyleTTS2 yl4579/StyleTTS2 MIT, human-level
40 GPT-SoVITS RVC-Boss/GPT-SoVITS MIT, singing support
41 Bark suno/bark MIT, sound effects
42 OpenVoice v2 myshell-ai/OpenVoiceV2 MIT, lightweight
43 Piper rhasspy/piper MIT, CPU-only
44 Tortoise TTS neonbjb/tortoise-tts Apache 2.0, slow
45 WhisperSpeech WhisperSpeech/WhisperSpeech Apache 2.0/MIT
46 MaskGCT Amphion ICLR 2025, 6 langs
47 OuteTTS edwko/OuteTTS MIT, llama.cpp
48 Spark-TTS SparkAudio/Spark-TTS-0.5B CC-BY-NC-SA

Audio Music On-Device (7 agents)

# Model HuggingFace/GitHub Focus
49 ACE-Step ACE-Step/ACE-Step-v1-3.5B Apache 2.0, 4min songs
50 YuE multimodal-art-projection/YuE Apache 2.0, 5min
51 DiffRhythm ASLP-lab/DiffRhythm Apache 2.0, 4m45s
52 MusicGen facebook/musicgen-large CC-BY-NC, variants
53 Stable Audio Open stabilityai/stable-audio-open-1.0 <$1M license
54 Riffusion riffusion/riffusion-model-v1 MIT, spectrograms
55 Magenta RT Google Open weights, real-time

Execution Plan

Session: Full Library Re-Validation (73 Agents Total)

Phase 1: Cloud Models (18 Opus agents, parallel)

  • 6 video model agents
  • 6 image model agents
  • 6 audio model agents
  • Each validates against official vendor docs
  • Returns: corrections, updated content, evidence links

Phase 2: On-Device Models (55 agents, parallel batches)

  • 13 video model agents
  • 18 image model agents
  • 17 TTS model agents
  • 7 music model agents
  • Each validates against HuggingFace + GitHub
  • CRITICAL: Mac compatibility verification for each model

Phase 3: Merge & Update

  • Merge all corrections into reference docs
  • Update 3 on-device compilation docs with per-model corrections
  • Cross-check against MODEL-AUDIT.md

Phase 4: Synthesis Update

  • Update PROMPT-VOCABULARY.md with verified terminology
  • Update comparison docs with verified capabilities
  • Update COST-OPTIMIZATION.md with verified pricing

Phase 5: Finalize

  • Mark all documents as validated
  • Update CONTINUITY.md
  • Layer 2 truly complete

Agent Summary

Category Cloud Agents On-Device Agents Total
Video 7 13 20
Image 7 18 25
Audio (TTS) 4 17 21
Audio (Music) 2 7 9
Total 20 55 75

Success Criteria

  • All 20 cloud model docs validated against official sources
  • All 55 on-device models validated against HuggingFace/GitHub
  • Mac compatibility verified for every on-device model
  • Every prompting guide verified against vendor recommendations
  • Every capability claim has evidence link
  • MODEL-AUDIT.md corrections applied to reference docs
  • Synthesis docs updated to reflect corrected information
  • CONTINUITY.md updated with completion status

Files Structure

references/
├── README.md              # Library index (needs status update)
├── GLOSSARY.md            # Terms and conventions
├── GAPS.md                # Known gaps
├── VALIDATION-REPORT.md   # Accuracy verification (needs update)
├── video/
│   ├── README.md
│   ├── veo-3.md           # NEEDS REVIEW
│   ├── sora-2.md          # NEEDS REVIEW
│   ├── runway-gen4.5.md   # NEEDS REVIEW
│   ├── kling-2.1.md       # NEEDS REVIEW
│   ├── luma-ray3.md       # NEEDS REVIEW
│   ├── hailuo-02.md       # NEEDS REVIEW
│   ├── midjourney-video.md # NEEDS REVIEW
│   └── on-device-models.md # NEEDS REVIEW
├── image/
│   ├── README.md
│   ├── nano-banana-pro.md # NEEDS REVIEW
│   ├── imagen-4.md        # NEEDS REVIEW
│   ├── flux-2.md          # NEEDS REVIEW
│   ├── gpt-image.md       # NEEDS REVIEW
│   ├── midjourney.md      # NEEDS REVIEW
│   ├── ideogram-3.md      # NEEDS REVIEW
│   ├── seedream-4.md      # NEEDS REVIEW
│   └── on-device-models.md # NEEDS REVIEW
└── audio/
    ├── README.md
    ├── elevenlabs.md      # NEEDS REVIEW
    ├── suno-v5.md         # NEEDS REVIEW
    ├── udio.md            # NEEDS REVIEW
    ├── openai-tts.md      # NEEDS REVIEW
    ├── fish-audio-openaudio-s1.md # NEEDS REVIEW
    ├── cartesia-sonic.md  # NEEDS REVIEW
    └── on-device-models.md # NEEDS REVIEW

planning/synthesis/
├── MODEL-AUDIT.md         # COMPLETE (model IDs verified)
├── VIDEO-COMPARISON.md    # NEEDS UPDATE after validation
├── IMAGE-COMPARISON.md    # NEEDS UPDATE after validation
├── AUDIO-COMPARISON.md    # NEEDS UPDATE after validation
├── PROMPT-VOCABULARY.md   # NEEDS UPDATE after validation
├── COST-OPTIMIZATION.md   # NEEDS UPDATE after validation
├── SCHEMA-RECOMMENDATIONS.md # NEEDS UPDATE after validation
└── INTEGRATION-PATTERNS.md # NEEDS UPDATE after validation

This research plan was updated 2025-12-27 to require full library re-validation before Layer 2 can be considered complete.

@OliviaAnderson205
Copy link

This research plan looks like a solid, structured approach to organizing and validating references for generative AI capabilities across cloud and on-device models — great foundation for anyone serious about documenting and comparing tools in the space. It’s clear you’re thinking deeply about the ecosystem, and I can definitely see how this could support people who are building generative ai models and tooling over time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment