altryne/thursdai-2026-01-22.md

## thursdai-2026-01-22.md

      
    Raw
  

              thursdai-2026-01-22.md
            
          
    ThursdAI Show Notes - January 22, 2026

Prepared by Wolfred 🐺

Inworld AI releases TTS-1.5, the fastest production-grade text-to-speech model that's 4x faster and 25x cheaper than competitors (X, Blog, Press Release)


Executive summary:


Inworld AI just dropped TTS-1.5 today, and this is a pretty significant release for anyone building voice-enabled applications. They're already sitting at number one on the Artificial Analysis TTS leaderboard based on blind user tests, and with this update they've achieved sub-250 millisecond latency for their Max model and under 130 milliseconds for Mini - that's 4 times faster than their previous generation. The quality improvements are substantial too: 30% more expressive speech and 40% lower word error rates, meaning fewer hallucinations and audio artifacts. But here's the kicker on pricing - at half a cent per minute for Mini and one cent for Max, they're claiming to be 25 times cheaper than the next best alternative. They've also expanded to 15 languages including Hindi, added enhanced voice cloning via API, and now offer on-premise deployment for enterprises with data residency requirements.


10 factoids:


TTS-1.5 Max achieves P90 latency under 250ms, while Mini hits under 130ms - both representing a 4x speed improvement over previous generations


Ranked #1 on Artificial Analysis TTS Leaderboard based on blind comparisons by thousands of real users


30% more expressive speech output compared to prior versions


40% reduction in word error rate, minimizing hallucinations, cutoffs, and audio artifacts


Pricing is $0.005/minute (Mini) and $0.01/minute (Max) - equivalent to $5-10 per million characters, claimed to be 25x cheaper than competitors


Supports 15 languages including English, Spanish, French, Korean, Chinese, Hindi, Japanese, German, and more


Voice cloning now available via API with 5-15 seconds of audio for instant cloning, or professional fine-tuning for maximum fidelity


On-premise deployment now available for enterprises with compliance/data residency requirements


Already integrated with Layercode, LiveKit, NLX, Pipecat, Stream Vision Agents, Ultravox, Vapi, and Voximplant


Founded by team from Google and DeepMind, backed by Lightspeed, Kleiner Perkins, and Stanford University


Additional links: TTS Product Page, TTS Playground, Documentation, Artificial Analysis Leaderboard


Flashlabs releases Chroma 1.0, the world's first open-source real-time speech-to-speech dialogue model with voice cloning (X)


Executive summary:


Flashlabs just dropped Chroma 1.0, and this is a big one for the open source voice AI community. It's being billed as the world's first open-source, end-to-end, real-time speech-to-speech dialogue model that includes personalized voice cloning. We're talking sub-150 millisecond latency here, which means it can actually hold a natural conversation without those awkward pauses. The voice cloning is remarkably efficient, needing only a few seconds of audio to replicate someone's voice, and they're claiming a speaker similarity score of 0.817, which is actually about 11% better than human baseline. What's really impressive is they packed all this into just 4 billion parameters, the full weights and code are open source, and it's been optimized with SGLang for faster inference. This is going to be huge for anyone building voice assistants, real-time translation, or conversational AI applications.


10 factoids:


First open-source end-to-end real-time speech-to-speech dialogue model with voice cloning


Sub-150ms end-to-end latency enables natural conversational flow


High-fidelity voice cloning requires only seconds of reference audio


Speaker similarity (SIM) score of 0.817 — 10.96% higher than human baseline


Compact architecture with only 4 billion parameters


Fully open weights and code released


Optimized with SGLang for faster inference performance


Developed by Flashlabs


End-to-end architecture means no separate ASR/TTS pipeline needed


Enables personalized voice cloning for custom voice assistants and applications


Additional links: Paper (link from tweet), HuggingFace Model (link from tweet), GitHub Code (link from tweet)


Liquid AI releases LFM2.5-1.2B-Thinking, a reasoning model that runs entirely on-device in under 900MB of memory (X, Blog, Hugging Face)


Executive summary:


Liquid AI just dropped something pretty wild - a reasoning model called LFM2.5-1.2B-Thinking that can run entirely on your phone with less than 900 megabytes of memory. This is a 1.2 billion parameter model that generates internal thinking traces before producing answers, essentially doing the chain-of-thought reasoning we've seen in much larger models, but at edge-scale latency. The kicker? What required a data center two years ago now runs offline in your pocket. They've trained this thing on 28 trillion tokens and used multi-stage reinforcement learning to make it particularly good at tool use, math, and instruction following. On benchmarks, it's matching or beating Qwen3-1.7B despite having 40% fewer parameters, and it scores 87.96 on MATH-500 compared to the non-thinking version's 63.2. Launch partners include Qualcomm, AMD, Ollama, and others, so this is ready for deployment across phones, laptops, vehicles, and IoT devices on day one.


10 factoids:


Runs in under 900MB of memory - fits on any modern smartphone


1.2 billion parameters with 16 layers (10 LIV convolution blocks + 6 GQA blocks)


Trained on 28 trillion tokens (up from 10T in LFM2)


32,768 token context length with robust long-context scaling


MATH-500 benchmark: 87.96% (jumped from 63.2% in the non-thinking Instruct version)


Decodes at 70 tok/s on Samsung Galaxy S25 Ultra CPU, up to 235 tok/s on AMD Ryzen AI Max 395


Uses GRPO-style reinforcement learning with curriculum training across math, reasoning, and tool use domains


Solved the "doom loop" problem (repetitive text patterns) - reduced from 15.74% to 0.36% through preference alignment


Open-weight with day-one support for llama.cpp, MLX, vLLM, ONNX, Ollama, and LM Studio


LFM2 family has crossed 6 million downloads on Hugging Face


Additional links:

LEAP Platform
Liquid AI Playground
GGUF Quantized Version
MLX Version for Apple Silicon
Technical Report (arXiv)
LFM2.5 Family Blog Post


Vercel launches skills.sh, an "npm for AI agents" that hit 20K installs within hours (X, Vercel Changelog, GitHub)

Date: January 20, 2026


Executive summary:

Vercel just dropped skills.sh, which is essentially a package manager for AI coding agents—think npm but for teaching your AI assistant best practices. You install a skill with one command like npx skills add vercel-labs/agent-skills, and suddenly your Claude Code, Cursor, Codex, Windsurf, or any of the 17+ supported agents knows 10 years worth of React and Next.js optimization patterns, web design guidelines, and can even deploy to Vercel for you. The ecosystem exploded on launch—their top React skill hit over 26,000 installs, the announcement tweet got 125,000+ views, and major players like Stripe, Expo, and Remotion shipped their own skills within hours. This is Vercel doing what Vercel does: entering a market and immediately becoming the default.


10 factoids:

The skills.sh leaderboard shows 200+ skills already live, with Vercel's React Best Practices skill leading at 26,000+ installs
Skills follow the Agent Skills specification, an open format originally developed by Anthropic
Supports 17+ AI coding agents: Claude Code, Cursor, Codex, GitHub Copilot, Windsurf, Clawdbot, Amp, Antigravity, Gemini, Goose, Kilo, Kiro, OpenCode, Roo, Trae, and more
The react-best-practices skill contains 40+ rules across 8 categories covering waterfalls, bundle size, server-side performance, and more
The web-design-guidelines skill includes 100+ rules covering accessibility, ARIA, forms, animation, typography, dark mode, and i18n
The vercel-deploy-claimable skill auto-detects 40+ frameworks and returns a preview URL plus a claimable URL for ownership transfer
Remotion's Jonny Burger created a whole video just by prompting Claude Code with Remotion's skill—the announcement video hit 147,000 views
Stripe shipped their own best practices skill within hours of the ecosystem launch
Skills are structured as folders with SKILL.md for instructions, a scripts directory for automation, and optional references
The ecosystem already includes skills from Expo (8 skills for React Native), Anthropic, Callstack, Trail of Bits (security), and many community contributors


Additional links: Agent Skills Spec, skills.sh Directory, MarkTechPost Coverage, Medium: 20K installs in hours


Anthropic's Claude Code VS Code Extension Hits General Availability, Bringing Full Agentic Coding to the IDE (X, VS Code Marketplace, Docs)


Executive summary:


Anthropic has officially launched the general availability of their Claude Code VS Code extension, bringing the full power of their CLI-based agentic coding tool directly into the world's most popular code editor. The extension provides a native graphical interface that lets developers chat with Claude to build features, debug code, and navigate codebases - all without leaving VS Code. What makes this release significant is the feature parity with the CLI: you can now @-mention files for context (with fuzzy matching!), use familiar slash commands like /model, /mcp, and /context, and Claude can autonomously explore your codebase, write code, and run terminal commands with your permission. The extension supports powerful agentic features including subagents, custom slash commands, and MCP (Model Context Protocol) for connecting to external tools like GitHub, Jira, and Google Drive. It's available for Pro, Max, Team, and Enterprise subscribers, or via pay-as-you-go pricing.


10 factoids:


The extension requires VS Code 1.98.0 or higher and works with Cursor as well


Three permission modes available: normal (asks each time), Plan mode (describes then waits for approval), and auto-accept (makes edits without asking)


@-mentions support fuzzy matching - type @auth to find auth.js, AuthService.ts, etc., and add trailing slash for folders like @src/components/


Claude can see your highlighted code automatically when you select text, and you can toggle this visibility on/off


Multiple conversations can run in parallel using "Open in New Tab" or "Open in New Window" - each with isolated context


Extension and CLI share conversation history - use claude --resume in terminal to continue any extension conversation


Supports third-party AI providers including Amazon Bedrock, Google Vertex AI, and Microsoft Foundry


MCP (Model Context Protocol) servers must be configured via CLI first, then become available in the extension


@terminal:name syntax lets you reference terminal output in prompts without copy-pasting error messages


A terminal-mode option exists for developers who prefer the CLI-style interface - toggle via settings


Additional links:

Claude Code Overview
Claude Code Product Page
Common Workflows
MCP Documentation
Settings Reference


OpenAI announces ads coming to ChatGPT free and Go tiers in the US (X, OpenAI Blog)


Executive summary:


OpenAI is taking its first big step into the advertising business—in the coming weeks, they'll start testing ads in ChatGPT for logged-in adult users in the US on the free tier and the new $8/month ChatGPT Go plan. The company is being very careful to emphasize that ads won't influence ChatGPT's actual responses—they'll appear in clearly labeled boxes at the bottom of answers, completely separate from the conversation. They're promising not to sell user data to advertisers, not to show ads on sensitive topics like health, mental health, or politics, and not to serve ads to users under 18. Paid subscribers on Plus, Pro, Business, and Enterprise tiers won't see any ads at all. This move reflects the massive financial pressure OpenAI is under—burning through about $9 billion this year while only 5% of their 800 million weekly users actually pay for subscriptions.


10 factoids:


Ads will only appear on free tier and the new $8/month ChatGPT Go subscription—Plus ($20/mo), Pro, Business, and Enterprise remain ad-free


Ads will be placed at the bottom of ChatGPT's answers in clearly labeled, separate boxes—never within the actual response text


OpenAI promises conversations are kept private from advertisers and user data is never sold to advertisers


Users can turn off ad personalization and clear data used for ads at any time without affecting other ChatGPT personalization features


Ads won't appear near sensitive or regulated topics including health, mental health, and politics


No ads will be shown to users under 18 (either self-reported or predicted by an age-prediction model OpenAI is rolling out)


OpenAI expects to burn through ~$9 billion this year while generating $13 billion in revenue


Only about 5% of ChatGPT's 800 million weekly active users currently pay for subscriptions


OpenAI has committed to spending about $1.4 trillion on massive data centers and chips for AI


CEO Sam Altman previously said in 2024 that he found the combination of ads and AI "uniquely unsettling"—this marks a notable shift from that position


Additional links:

ChatGPT Go Announcement
Ars Technica Coverage
WIRED Coverage
Reuters
Forbes
CNN
NYT
CNBC


Z.ai releases GLM-4.7-Flash, a 30B parameter MoE model that sets a new standard for lightweight local AI assistants (X, Technical Blog, HuggingFace)


Executive summary:


Zhipu AI, the Chinese AI company behind the ChatGLM family, just dropped GLM-4.7-Flash, which is a 30B parameter MoE (mixture of experts) model with only 3 billion active parameters - making it incredibly efficient for local deployment. This model is specifically designed to be your local coding and agentic assistant, and it's crushing benchmarks in its weight class. It scores 91.6% on AIME 2025, hits 59.2% on SWE-bench Verified (compared to just 22% for Qwen3-30B), and delivers 79.5% on τ²-Bench for agent tasks. The model supports vLLM and SGLang for local inference, offers a free API tier with 1 concurrency, and is particularly recommended for coding, creative writing, translation, long-context tasks, and roleplay. Weights are fully open on HuggingFace.


10 factoids:


GLM-4.7-Flash is a 30B-A3B MoE model (30 billion total parameters, only 3 billion active at any time)


Achieves 91.6% on AIME 2025, outperforming even GPT-OSS-20B (91.7%) and crushing Qwen3-30B-A3B-Thinking (85%)


Scores 59.2% on SWE-bench Verified vs only 22% for comparable Qwen3-30B model - nearly 3x better at coding agents


Supports up to 131,072 max new tokens for generation


Free API available with 1 concurrency, plus FlashX tier for high-speed inference


Open weights available on HuggingFace and ModelScope for local deployment


Supports vLLM and SGLang inference frameworks with speculative decoding


Features "Preserved Thinking" mode that retains reasoning across multi-turn conversations for complex agentic tasks


Part of the GLM-4.7 family from Zhipu AI (Z.ai), which also includes full GLM-4.7 with 73.8% on SWE-bench Verified


Built by THUDM (Tsinghua University Data Mining group) and Zhipu AI, the team behind ChatGLM, CogView, and CogVideo


Additional links:

Z.ai API Documentation
GitHub Repository
Technical Report (arXiv)
ModelScope
Z.ai Chat
Discord Community


Overworld releases a real-time local-first diffusion world model that runs at 60fps on consumer hardware (X, Press Release)


Executive summary:


Overworld, formerly known as Wayfarer Labs, just dropped a research preview of their real-time diffusion world model that's all about creating playable, interactive AI-generated worlds that run entirely on your local machine. This is huge because unlike cloud-based solutions that need round-trips to data centers, their model runs at 60 frames per second with sub-20ms latency on consumer-grade GPUs - we're talking Chromebooks, gaming PCs, even console-class hardware. The company is positioning this as a new kind of interactive AI where you can shape adaptive worlds directly through human imagination - they describe it as "living worlds that behave more like lucid dreams than software." It's backed by a $4.5 million pre-seed round led by Kindred Ventures with angels like Logan Kilpatrick on board, and they're making the code open source on GitHub.


10 factoids:


Runs at 60fps with sub-20ms latency entirely on local consumer GPUs - no cloud required


Works on everything from Chromebooks to gaming PCs to console-class hardware


The company rebranded from Wayfarer Labs to Overworld for this release


Backed by $4.5 million pre-seed round led by Kindred Ventures


Notable angel investors include Logan Kilpatrick and senior leaders from Snowflake and Roblox


Founded by Louis Castricato (CEO, Brown University) and Shahbuland Matiana (Chief Science Officer)


Uses diffusion-based world models structured as continuous real-time systems that incorporate user input into every frame


Fully open source with code on GitHub including world_engine inference library and Biome desktop client


Supports full keyboard and mouse input, not just basic WASD controls


The model uses a DiT architecture with autoencoder, text encoder, and KV cache with optimized backends for Nvidia, AMD, and Apple Silicon


Additional links:

Overworld Website
GitHub Organization
World Engine Repo
Biome Desktop Client
Hacker News Discussion
Discord (OWL/Wayfarer)


Sakana AI introduces RePo, a new way for language models to dynamically reorganize their context for better attention (X, Paper, Website)


Executive summary:


Sakana AI, the Tokyo-based research lab, just dropped a pretty clever innovation called RePo - short for Context Re-Positioning. The core insight here is that current LLMs are stuck processing everything as one flat, rigid sequence of tokens, which makes it hard for them to handle noisy contexts, structured data like tables, or long documents where important info is far apart. RePo adds a lightweight learned module that assigns each token a real-valued position based on its meaning, not just its place in the sequence. This lets the model dynamically pull semantically related tokens closer together in "attention space" even if they're far apart in the actual text, and push irrelevant noise away. They trained it on OLMo-2 1B for 50 billion tokens and saw consistent gains - 11 points better on noisy context tasks, almost 2 points better on structured data, and strong improvements on long-context benchmarks up to 16K tokens even though it was only trained on 4K. The overhead is less than 1% compute, so it's basically free performance.


10 factoids:


RePo stands for "Context Re-Positioning" - it lets models assign learned, real-valued positions to tokens based on semantics rather than fixed integer indices


The technique is inspired by Cognitive Load Theory from human psychology - the idea that poor organization creates "extraneous load" that wastes our working memory


RePo improved noisy context performance by 11.04 points over standard RoPE positional encoding on the RULER benchmark


On structured data tasks (graphs and tables linearized into text), RePo beat RoPE by 1.94 exact match points


The model was trained on only 4K context length but extrapolated well to 8K and 16K tokens, beating baselines by at least 5.48 points on LongBench


RePo adds less than 1% compute overhead - it's essentially free performance


Analysis shows RePo allocates 15% more attention to "needle" tokens (distant but relevant) compared to standard RoPE


Different attention heads learn different position ranges - some specialize in local reorganization, others in global structure


The learned positions show non-linear patterns with plateaus and jumps, indicating the model discovers meaningful structure


The code, models, and an interactive demo are all open-source - you can visualize how tokens get repositioned in real-time


Additional links:

GitHub Repository
Hugging Face Models
Interactive Demo
ArXiv PDF


Runway Gen-4.5


Source: https://x.com/runwayml/status/2014090404769976744
Date: Jan 21, 2026
Category: Video Generation

TL;DR

Runway released Gen-4.5, introducing a novel tiled input method that generates multi-panel videos from image grids. Also launched two new apps: Sketch Motion (draw animation paths on images) and Character Swap (insert yourself into scenes).
Executive Summary

Runway's Gen-4.5 represents a significant evolution in video generation, most notably through its innovative tiled input system. Users can submit 2x2, 3x3, or 4x4 image grids, and the model generates coherent multi-panel animations—a technique the community has been exploring for VFX-like motion effects.
Creative techniques emerging include "abusing whip pans" for controlled visual exploration and using panels individually for smoother animations, though consistency varies across generations. The model scores highly (92/100) for visuals and coherence in early reviews.
Alongside the model, Runway introduced Sketch Motion (draw animation paths directly on images) and Character Swap (insert users into scenes), with more apps promised next week. The platform is evolving toward an all-in-one node-based UI for tool integration.
Key Details


Tiled Input Method: Submit 2x2, 3x3, or 4x4 image grids for multi-panel video generation
Sketch Motion: New app for drawing animation paths on images
Character Swap: Insert yourself into generated scenes
No native audio: Community using MMAudio and external tools
VFX potential: Professional-grade motion from still images
Whip pan technique: Discovered for controlled exploration
Node-based UI: Platform positioning as all-in-one suite

Limitations


Occasional over-animation of grids
Consistency varies across generations
No native sound generation

10+ Factoids


Gen-4.5 accepts tiled image grids (2x2, 3x3, 4x4) as input
Multi-panel animations can be used for VFX-like motion effects
Sketch Motion lets you literally draw where things should move
Character Swap inserts your likeness into generated scenes
Community scoring it 92/100 for visuals and coherence
"Whip pan abuse" is becoming a creative technique
More apps coming next week per Runway
Detail retention from source images described as "robust"
Works well for professional-grade motion from stills
Stress tests include feathers, spaghetti physics, and complex scenarios
MMAudio being used for sound since native audio is absent
Node-based UI improvements positioning it as evolving platform

Community Links


@TheoMediaAI
@ProperPrompter
@jerrod_lew
@andymac3d
@VanIden59543
@rayisdoingfilm
@notiansans
@S4MFI
@JakeLindsay
@swaymolina


End of show notes — 10 items
No results found