Cainew - Curated AI news for developers

TL;DR

Model Releases

9 demos of Gemini Omni and Gemini 3.5 in action

Tools & Products

Research Papers

UniSteer: Text-Guided Flow Matching in Activation Space for Versatile LLM Steering
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments
YoCausal: How Far is Video Generation from World Model? A Causality Perspective

Industry News

Discussion

Model Releases

9 demos of Gemini Omni and Gemini 3.5 in action

Watch 9 videos showing the capabilities of Gemini Omni and Gemini 3.5, announced at Google I/O 2026.

RSS

Tools & Products

nexu-io/html-anything

✨ The agentic HTML editor — your local AI agent writes the HTML, you ship it. 🚀 75 Skills × 9 Surfaces (magazine · deck · poster · XHS / tweet · prototype · data report · Hyperframes) 🛡️ Sandboxed preview · 📤 1-click to WeChat / X / Zhihu / HTML / PNG 🔑 Zero API key — Claude Code / Cursor / Codex / Gemini / Copilot / OpenCode / Qwen / Aider.

GitHub

zarazhangrui/feishu-claude-code-bridge

Bot that bridges Feishu/Lark messenger with a local Claude Code CLI — streaming cards, per-chat sessions, multiple workspaces

GitHub

tolibear/goalbuddy

A better /goal for Codex and Claude Code

GitHub

Real-time LLM Inference on Standard GPUs: 3k tokens/s per request

A new approach enables real-time LLM inference on standard GPUs, achieving throughput of 3,000 tokens per second per request.

RSS

Ava 2.0: Your AI BDR that runs outbound sales autonomously

Ava is an AI BDR that runs your entire outbound on autopilot. She sources leads from 250M+ professionals, runs multi-channel outreach, and books qualified meetings. Fully autonomously.

ProductHunt

/monitor by Firecrawl: Notify your AI agent when the web changes

/monitor notifies your agent via webhook the moment pages or sites change. Use up to 90% fewer LLM tokens by only ingesting what changes on a page.

ProductHunt

Ava Studio: Your AI creative team for video ads

Ava Studio researches your product, develops hooks and creative angles, then generates 50+ editable short-form ad variants ready for TikTok, Reels, Meta, and any platform you want to ship on.

ProductHunt

MCP Bridge by Appfactor: Connect any API to any AI agent

Point MCP Bridge at any REST, GraphQL, SOAP, or gRPC API. It auto-generates MCP tool definitions with typed schemas, auth, rate limiting, and response processing. Your LLM agents call enterprise APIs through one standard interface.

ProductHunt

Firecoach AI: AI roleplays that turn reps into top performers

FireCoach.ai is the fastest way to clone your sales methodology and coach every rep on your team — at scale, without adding headcount. Build custom AI sales bots trained on your playbook, run rep roleplays, get scored feedback, and identify coaching gaps before they show up on a lost deal.

ProductHunt

Integuru: Generate fast, reliable APIs for any platform. No browsers

Integuru generates fast, reliable APIs for any platform, without browsers or RPA. API calls complete in ~3 seconds with 99.9%+ success. Most agents today use browser automation to control web apps that lack official APIs, but this is slow and brittle. Integuru replaces browsers entirely and connects directly with the backend. Integuru covers authentication and edge cases. Integrations get auto-healing, API docs, and a 24/7 on-call maintenance team. Each API is generated end-to-end in minutes.

ProductHunt

tomfunk/fungible

Terminal UI for personal finance — Plaid sync, CSV import, AI assistant, and MCP server

GitHub

How Braintrust turns customer requests into code with Codex

How Braintrust engineers use Codex with GPT-5.5 to run experiments and code faster.

OpenAI

Show HN: AISlop, a CLI for catching AI generated code smells

AISlop is a new command-line tool designed to detect and identify code smells commonly found in AI-generated code.

GitHub

Strengthening societal resilience with Rosalind Biodefense

OpenAI launches Rosalind Biodefense, expanding trusted access to GPT-Rosalind for vetted developers and U.S. government partners advancing biodefense, public health, and pandemic preparedness through frontier AI.

OpenAI

Research Papers

UniSteer: Text-Guided Flow Matching in Activation Space for Versatile LLM Steering

Activation-based control steers large language models (LLMs) by intervening on their internal representations during inference, and has emerged as an effective paradigm for controlling behaviors such as persona and style. However, existing methods often rely on fixed steering directions or task-specific intervention modules, making them difficult to adapt to fine-grained concepts and compositional constraints. We propose UniSteer, a text-guided activation flow matching model that learns a condit...

HuggingFace

Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

Embodied intelligence is often studied through specialized models for individual tasks such as manipulation or navigation, resulting in fragmented capabilities and limited generalization across tasks, environments, and robot embodiments. In this work, we study whether heterogeneous embodied decision-making problems can be unified within a single vision-language-action model. We present Qwen-VLA, a unified embodied foundation model that extends Qwen's vision-language modeling stack from perceptio...

HuggingFace

YoCausal: How Far is Video Generation from World Model? A Causality Perspective

As video diffusion models (VDMs) advance toward world models, a key question arises: do they truly understand causality, or merely overfit to statistical temporal patterns? Existing benchmarks mostly rely on synthetic data, limiting real-world generalization due to the sim-to-real gap. We present YoCausal, a two-level benchmark inspired by the Violation of Expectation (VoE) paradigm from cognitive science. By temporally reversing real-world videos at zero cost as natural counterfactual samples, ...

HuggingFace

Verifiable Rewards Beyond Math and Code: Lightweight Corpus-Grounded Process Supervision for Factual Question Answering

Applying reinforcement learning to improve factual accuracy in knowledge-intensive question answering faces a reward design dilemma. Response-level rewards provide only coarse supervision and cannot distinguish correct from incorrect statements within a reasoning trace. Sentence-level alternatives offer finer-grained feedback, but typically rely on NLI verifiers, LLM judges, or knowledge-verification pipelines that are expensive to deploy at RL scale and often unreliable for rare-entity facts, w...

HuggingFace

How LoRA Remembers? A Parametric Memory Law for LLM Finetuning

Large Language Models (LLMs) must continuously learn and update knowledge to remain effective in dynamic real-world environments. While Low-Rank Adaptation (LoRA) is widely used for such memory updates, existing studies mainly rely on qualitative downstream evaluations, leaving the quantitative capacity limits and underlying dynamics of exact parametric memory largely unexplored. To bridge this gap, we employ LoRA as a controlled memory capacity probe within the latent space to systematically qu...

HuggingFace

Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection

Recent advances in Vision-Language Models (VLMs) have achieved impressive performance across many tasks, yet prior studies report unsatisfactory performance when applying large language or multimodal models to finding abnormal patterns in sequential data. Public anomaly detection benchmarks typically provide interval annotations but not natural-language rationales, making it difficult to fine-tune VLMs to produce grounded, interpretable decisions. To address this gap, we construct VisAnomBench, ...

HuggingFace

ChildVox: A Speech, Audio, and Large Audio-Language Model Benchmark in Understanding and Characterizing Sound across Childhood

We present ChildVox, a novel benchmark for characterizing the diverse acoustic signals through which children communicate. Specifically, ChildVox follows the full developmental trajectory from birth through school age, covering physiological sounds, non-linguistic vocalizations, canonical syllables, and spoken language. ChildVox integrates more than 20 sub-tasks across 17 child-centered audio and speech datasets, enabling systematic cross-corpus and cross-domain comparison. We evaluate a represe...

HuggingFace

Towards Consistent Video Geometry Estimation

This work presents ViGeo, a feed-forward foundation model for recovering spatially dense and temporally consistent geometry from video sequences. Built upon a plain transformer architecture without task-specific architectural modifications, ViGeo supports streaming, full-sequence, and long-video inference within a unified model. The key design is dynamic chunking attention, which exposes the model to both bidirectional and causal temporal contexts during training and allows it to adapt its atten...

HuggingFace

When Cloud Agents Meet Device Agents: Lessons from Hybrid Multi-Agent Systems

The design space of agentic AI inference spans two extremes: frontier large language models (LLMs), typically hosted in the cloud and offering strong performance across a wide range of tasks at substantially high cost, and more cost-efficient small language models (SLMs), which are amenable to on-device inference. Hybrid multi-agent systems (MASs) combining on-device and cloud models offer a promising middle ground, but they also introduce a complex and poorly understood design space in which ta...

HuggingFace

PhyGenHOI: Physically-Aware 4D Generation of Dynamic Human-Object Interactions

We address the task of generating physically accurate and visually faithful 4D Human-Object Interaction (HOI). Given a static 3D human and target object represented as 3D Gaussian Splats (3DGS), our goal is to synthesize dynamic scenes where the human actively engages with the object through actions, such as punching or kicking, in accordance with a given input text. To this end, we introduce PhyGenHOI, a novel framework that couples generative human motion with an explicit physical object simul...

HuggingFace

WorldMemArena: Evaluating Multimodal Agent Memory Through Action-World Interaction

Multimodal large language models are increasingly deployed as long-horizon agents, where memory must do more than recall: it must track an evolving world, revise what has gone stale, and surface the right evidence at decision time. Existing benchmarks measure recall over static dialogue, collapse memory into a single end-of-task accuracy, and reduce visual observations to captions, leaving us unable to localize failures to writing, maintenance, retrieval, or use. The rise of agent harnesses that...

HuggingFace

AdaState: Self-Evolving Anchors for Streaming Video Generation

Autoregressive video diffusion models generate streaming video by producing frames sequentially, conditioning each chunk on previously generated content. These models are structurally anchored to the first frame: its key-value representation occupies a privileged position in the attention cache and serves as the primary scene reference throughout generation. As the cleanest and most error-free position in the cache, this anchor draws disproportionate attention, suppressing video dynamics, and lo...

HuggingFace

LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents

Mastering terminal environments requires language agents capable of multi-step planning, feedback-grounded execution, and dynamic state adaptation. However, training such agents is currently bottlenecked by a reliance on scraped external repositories, which limits domain diversity, environment controllability, and the targeting of specific capability deficits. We introduce LiteCoder-Terminal-Gen, a zero-dependency synthesis pipeline that autonomously generates executable and verifiable terminal ...

HuggingFace

REPOT: Recoverable Program-of-Thought via Checkpoint Repair

One-shot Program-of-Thought (PoT) emits a Python program that prints a primitive-action plan; a single invalid action silently invalidates the trajectory. We introduce RePoT (Recoverable PoT): a deterministic verified replay that walks the plan through the environment to its first invalid transition, then one LLM call that resumes from the verified prefix. RePoT costs at most one extra LLM call on the ~14% of problems where PoT fails. RePoT beats PoT by +3 to +11pp across four closed-model confi...

HuggingFace

LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training

Reinforcement learning (RL) post-training has shown to improve reasoning in large language models (LLMs). However, there has been little exploration on the problem of data contamination in RL post-training, potentially undermining generalization and evaluation reliability of the training process itself. Existing detection methods primarily rely on output-level signals such as likelihood or entropy, which become unreliable for RL-trained models since RL shapes behavior through trajectory-level re...

HuggingFace

Industry News

Sam Altman and Dario Amodei are both walking back AI jobs apocalypse predictions

Sam Altman and Dario Amodei have recently retreated from earlier catastrophic predictions about AI eliminating jobs in the near term. Both leaders are now adopting a more cautious stance on the timeline and severity of AI-driven employment disruption.

RSS

Notes from the Mistral AI Now Summit in Paris

Key announcements and insights were shared at the Mistral AI Now Summit held in Paris, showcasing the latest developments from the Mistral AI team.

RSS

Amazon scraps AI leaderboard to stop workers chasing usage scores

Amazon discontinued its AI leaderboard to prevent workers from becoming overly focused on chasing usage metrics rather than genuine productivity.

RSS

Microsoft data suggests using AI is more expensive than hiring people

Microsoft's internal data reveals that deploying AI tools is often more costly than hiring additional human workers for the same tasks.

RSS

Boston Children’s uses AI to unlock new diagnoses

Boston Children’s Hospital uses OpenAI technology to improve patient care, reduce operational burden, and help diagnose more than 40 rare disease cases.

OpenAI

Discussion

The mysterious Hy3 LLM is topping OpenRouter Model Rankings by a large margin

An mysterious LLM named Hy3 has unexpectedly dominated OpenRouter's model rankings by a significant margin, raising questions about its capabilities and origins.

RSS

Various LLM Smells

The article explores various code smells and anti-patterns commonly found in LLM-generated and LLM-influenced code.

RSS

Is AI causing a repeat of Front end's Lost Decade?

An analysis examines whether AI is causing frontend development to enter a similar period of stagnation as the industry's previous lost decade.

RSS

Protestware for Coding Agents

Protestware is emerging as a concept for coding agents, potentially incorporating protest or resistance mechanisms into AI-driven development tools.

RSS

Check out real-life AI prototypes from the Futures Lab.

University of Waterloo students develop AI prototypes like sign language tutors to reshape the future of education and work.

RSS