Skip to content

Instantly share code, notes, and snippets.

Dear Authors of Submission #45242,
Thanks to the efforts of all reviewers, emergency reviewers, ACs, and SACs, preliminary reviews are now available. Below you will find the preliminary reviews for your CVPR 2026 submission, MotionV2V: Editing Motion in a Video (#45242). Authors have the opportunity to submit a rebuttal by January 29, 2026 11:59 PM AoE. Please review the rest of this email, and the Author Guidelines for additional details on the rebuttal process.
We refer to the current reviews as preliminary because review revisions to improve clarity are possible for reviewers until January 23 11:59pm AoE.
Reviews
You can access the reviews for your submission by clicking on the corresponding submission in OpenReview: https://openreview.net/group?id=thecvf.com/CVPR/2026/Conference/Authors
Full Research Report: LTX-2 Model Architecture Comparison
Executive Summary
Your key question answered: The IC (In-Context) models and I2V (Image-to-Video) models use fundamentally different conditioning approaches, but critically, neither uses channel
concatenation to expand the input channels. LTX-2 maintains 128 latent channels for ALL model variants.
---
1. Input Channel Dimensions (The Core Answer)
from rp import *
from rp.git.CommonSource.projects.mask_generator.random_video_mask_generator import get_random_video_mask
def megamask():
S=10
minimasks=[get_random_video_mask() for x in range(S*S)]
video=tiled_videos(minimasks)
video=slowmo_video_via_rife(video)
video=slowmo_video_via_rife(video)
video=slowmo_video_via_rife(video)
video=slowmo_video_via_rife(video)
╭─── Claude Code v2.0.73 ──────────────────────────────────────────────────────────────────────────────╮
│ │ Tips for getting started │
│ Welcome back! │ Run /init to create a CLAUDE.md file with instructions for Claude │
│ │ ───────────────────────────────────────────────────────────────── │
│ * ▐▛███▜▌ * │ Recent activity │
│ * ▝▜█████▛▘ * │ No recent activity │
│ * ▘▘ ▝▝ * │ │
│ │ │
│ Opus 4.5 · API Usage Billing │ │
import rp
from google import genai
from google.genai.types import (
GenerateContentConfig,
Modality,
ThinkingConfig,
ImageConfig,
)
import os
url = "https://huggingface.co/datasets/OneOverZero/Calvin__task_ABC_D_h5__training_s224_h"
dataset_parent = "/"
clone_command = f"""
cd {dataset_parent}
git lfs install
git clone {url}
""" # sh
dataset_folder = path_join(dataset_parent, get_folder_name(url))
if not folder_exists(dataset_folder):
r._run_sys_command(clone_command)
def film_strip(video, length=None, height=None, width=None, vertical=False):
"""
Create a film strip effect from a video sequence.
Args:
video: List of video frames/images
length: Optional number of frames to use (defaults to all frames)
height: Height to resize frames to (default: 480)
width: Width to resize frames to (default: 720)
#CYCLESORT!!!
def swap(a,i,j):
a[i],a[j]=a[j],a[i]
class Counted:
def __repr__(self):
return 'X'
X=Counted()
a=shuffled([3,1,7,8,8,3,3,3,2,4,5,5,0])
from rp.git.Figures.film_strip.film_strip import film_strip
video = "/Users/ryan/CleanCode/Projects/Google2025_Paper/OverleafGit/figures_src/windmill_chase.mp4"
video = load_video(video, use_cache=True)
video = as_float_images(video)
# video=mean(video[:-3],video[1:-2],video[2:-1],video[3:])
before, after = split_tensor_into_regions(video, 1, 1, 2)
frames = [0, 5, 10]
from rp.git.Figures.film_strip.film_strip import film_strip
video='/Users/ryan/CleanCode/Projects/Google2025_Paper/OverleafGit/figures_src/CatFish.mp4'
video=load_video(video, use_cache=True)
for vert in [True,False]:
before=video[:,80:,:720]
after=video[:,80:,-720:]
#display_video(after)
frames=[0,25,46]