Cainew - Curated AI news for developers

TL;DR

Model Releases

Gemma 4 12B: A unified, encoder-free multimodal model

Tools & Products

Research Papers

Industry News

Discussion

Now AI agents need what RSS does

Model Releases

Gemma 4 12B: A unified, encoder-free multimodal model

Google introduces Gemma 4 12B, a compact multimodal AI model that combines text and image understanding without requiring separate encoders for improved efficiency. This unified architecture aims to make advanced multimodal capabilities more accessible for deployment.

RSS

Tools & Products

opensquilla/opensquilla

OpenSquilla — Token-Efficient AI Agent with same budget, higher intelligence density

GitHub

Kaelio/ktx

ktx is an executable context layer for data and analytics agents 🐙 Allow Claude Code, Codex, and any AI agent to query data accurately through MCP with skills, memory and a semantic layer

GitHub

sandiiarov/skill-creator

Turn any MCP server, OpenAPI spec, or GraphQL endpoint into a CLI at runtime.

GitHub

bidah/skill-set

Collection of Claude Code skills

GitHub

SIXIANGGUO/cc-note-ops

Obsidian note operations panel powered by Claude Code

GitHub

Episkey-G/GrokSearch-rs

Rust MCP server for Grok web search and Tavily-backed source retrieval

GitHub

pedrofariasx/qwenproxy

Proxy API OpenAI-compatible que usa automação com Playwright para rotear requisições para modelos do Qwen com suporte a múltiplas contas, tools e sessões persistentes.

GitHub

Hermes Desktop: The agent that grows with you

Hermes Desktop — the open-source agent that grows with you, now a native app for macOS, Windows, and Linux. By Nous Research.

ProductHunt

Spectron: Agent memory you can trust

Spectron is agent memory built on one ACID substrate. Graph, vectors, documents, and structured rows commit in one transaction. Every fact carries provenance. Corrections supersede, never overwrite. Hybrid retrieval fuses vectors, graph, BM25, and keywords. Traces feed back into ranking. Tri-temporal facts, multi-tenant scopes, and MCP support. No stitched stores. No sync pipelines.

ProductHunt

Show HN: Paseo – Beautiful open-source coding agent interface

Paseo is an open-source, user-friendly interface designed to simplify interactions with AI coding agents. The tool aims to make powerful AI development assistants more accessible to developers through an intuitive visual interface.

GitHub

Composer: Multiplayer markdown for you, your team, and your agents.

Composer is a real-time, multiplayer markdown editor where people and agents can work side-by-side. Instantly share markdown generated by your agent with teammates, edit in real time, leave comments, suggestions, and share context. Your agents join as true collaborators, working directly alongside you and your team.

ProductHunt

Use your Nvidia GPU's VRAM as swap space on Linux

A new Linux technique allows users to leverage their Nvidia GPU's VRAM as additional swap space, effectively expanding available memory for system operations. This workaround can improve performance for memory-intensive tasks on systems with limited RAM.

GitHub

Introducing the Services Track and Partner Hub of the Claude Partner Network

Anthropic

Research Papers

U of T researchers demonstrate AI worm could target any online device

University of Toronto researchers have demonstrated a proof-of-concept AI worm capable of spreading across and compromising any internet-connected device. This security vulnerability raises critical concerns about AI-based malware and the need for improved defenses.

RSS

BA-T: An Iterative Transformer for Two-View Bundle Adjustment

Feed-forward models for 3D reconstruction have achieved strong performance using deep cross-view attention to exchange information across images. However, these approaches often depend on heavy decoder stacks and lack a structured mechanism for geometry refinement, resulting in poor multi-view consistency. We address this by drawing inspiration from classical bundle adjustment (BA), which can be viewed as an iterative information propagation process between poses and local geometry. Inspired by ...

HuggingFace

World Models Meet Language Models: On the Complementarity of Concrete and Abstract Reasoning

World models and multimodal large language models (MLLMs) provide complementary capabilities for predicting future outcomes from static visual observations. World models can generate concrete visual rollouts of possible futures, while MLLMs can reason abstractly over questions, goals, and rules. However, generated rollouts are stochastic and may be visually plausible but task-incorrect, making it necessary to determine when visual simulation is useful, whether a rollout is credible, and how it s...

HuggingFace

Benchmarking Visual State Tracking in Multimodal Video Understanding

Understanding a video requires more than recognizing isolated moments, as humans continuously track entities, states, and events over time. This capacity for visual state tracking is fundamental to video understanding, yet remains underexplored in current evaluations of Multimodal Large Language Models (MLLMs). We introduce Visual STAte Tracking benchmark (VSTAT), a video-based benchmark designed to diagnose visual state tracking in MLLMs. VSTAT consists of 834 clips drawn from both synthetic an...

HuggingFace

Conditional Hypothesis Generation for LLM-Based Text Analysis with Researcher-Specified Covariates

A core goal of computational social science is to discover interpretable differences in how language varies across outcomes of interest, such as political affiliation or instructional quality. Recent LLM-based hypothesis generation methods describe such differences in natural language, but select for globally discriminative patterns without accounting for covariates that shape the data based on researchers' domain knowledge. When covariates are ignored, selected patterns can reflect confounds ra...

HuggingFace

KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks

Test-time scaling is a powerful approach to obtain better reasoning in large language models, but it becomes memory-bottlenecked during long-horizon decoding, as the KV-cache grows. KV-cache quantization can help improve this, but current methods are evaluated under prefill-like settings and errors behave differently under autoregressive decoding. We show that in the latter regime, quantization errors accumulate across timesteps, driven primarily by incorrect token scales. We introduce KVarN, a ...

HuggingFace

NVIDIA OmniDreams: Real-Time Generative World Model for Closed-Loop Autonomous Vehicle Simulation

As autonomous vehicle capabilities advance, the safe evaluation of driving policies in long-tail scenarios remains a critical bottleneck. In closed-loop simulation, the driving policy model actively interacts with the environment, where its actions dynamically update the simulator state and directly influence the next set of generated sensor observations. While recent reconstruction-based neural simulators offer photorealism, they are fundamentally constrained by their initial captured data and ...

HuggingFace

Bootstrap Your Generator: Unpaired Visual Editing with Flow Matching

Modern generative models possess a deep understanding of visual content, yet training them for image editing typically requires massive datasets of paired examples. This limits scalability, especially for video editing where collecting paired data is prohibitively expensive. We propose Bootstrap Your Generator (ByG), a general framework for unpaired training of flow matching editing models. It leverages the base model's knowledge without any external signal. Our approach pairs instruction-follow...

HuggingFace

Value-Aware Stochastic KV Cache Eviction for Reasoning Models

Reasoning models improve accuracy through extended chains of thought, but their long outputs create a memory and compute bottleneck. KV cache eviction methods reduce this cost by evicting unimportant key-value pairs from the cache, yet they often yield worse accuracy than selection-based sparse attention alternatives, which keep the full KV cache. We identify key factors crucial to KV cache eviction accuracy. First, a small fraction of value states have abnormally large magnitudes, and evicting ...

HuggingFace

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

We introduce Humanoid-GPT, a GPT-style Transformer with causal attention trained on a billion-scale motion corpus for whole-body control. Unlike prior shallow MLP trackers constrained by scarce data and an agility-generalization trade-off, Humanoid-GPT is pre-trained on a 2B-frame retargeted corpus that unifies all major mocap datasets with large-scale in-house recordings. Scaling both data and model capacity yields a single generative Transformer that tracks highly dynamic behaviors while achie...

HuggingFace

Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories

The past few decades have witnessed significant advances in the design of machine learning algorithms, from early studies on task-specific shallow models to more general deep Large Language Models (LLMs). Despite showing promising results in tasks that require instant prediction or in-context learning, existing models lack the ability to continually learn and effectively transfer their temporal in-context knowledge to their long-term parameters. Inspired by human learning process, we introduce a...

HuggingFace

PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training

We introduce PaddleOCR-VL-1.6, an upgraded compact document parsing model built upon PaddleOCR-VL-1.5. Although PaddleOCR-VL-1.5 establishes a strong 0.9B baseline, its remaining errors concentrate in under-optimized regions where model behavior is unstable, data coverage is sparse, or supervision is unreliable. Rather than expanding the training corpus indiscriminately, PaddleOCR-VL-1.6 introduces a region-aware data optimization framework that identifies weak regions from the previous model, a...

HuggingFace

Ultralytics YOLO26: Unified Real-Time End-to-End Vision Models

Real-time vision demands models that are accurate, efficient, and simple to deploy across diverse hardware. The YOLO family has become widely deployed for this reason, yet most YOLO detectors still rely on non-maximum suppression at inference, carry heavy detection heads due to Distribution Focal Loss, require long training schedules, and can leave the smallest objects without positive label assignments. We present Ultralytics YOLO26, a unified real-time vision model family that addresses these ...

HuggingFace

Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling

Test-time scaling improves the reasoning performance of large language models but incurs substantial cost in both total computation and latency. Existing adaptive sampling methods partially mitigate this issue by dynamically deciding when to stop sampling, yet they typically rely on heuristic rules or rely on distribution assumptions. In this work, we formulate adaptive sampling as a Markov decision process (MDP). We train a lightweight sampling controller with reinforcement learning (RL) to joi...

HuggingFace

AI outperforms law professors in Stanford Law study

A Stanford Law study found that AI systems outperform law professors in certain legal tasks, suggesting AI has reached competitive capability levels in specialized knowledge domains. This research highlights AI's potential to transform professional services and education.

RSS

Industry News

A blueprint for democratic governance of frontier AI

OpenAI outlines a blueprint for U.S. governance of frontier AI, proposing a federal framework for safety, resilience, and national security.

OpenAI

32GB of DDR5 now costs $375 – AI shortage continues to squeeze PC building

DDR5 memory prices have dropped to $375 for 32GB, but persistent AI demand continues to drive up costs for PC components. The AI boom is creating ongoing supply constraints that keep hardware prices elevated compared to historical trends.

RSS

More than 6 out of 10 people turn to AI for psychological support

A survey reveals that over 60% of people are turning to AI systems for mental health and psychological support. This trend highlights growing reliance on AI for mental wellness, raising questions about effectiveness, safety, and the role of human professionals.

RSS

What we learned mapping a year’s worth of AI-enabled cyber threats

Anthropic

OpenAI public policy agenda

OpenAI outlines its public policy agenda for AI, including safety, youth protection, workforce transition, and global standards to ensure AI benefits society.

OpenAI

Discussion

Now AI agents need what RSS does

This piece argues that AI agents need standardized protocols and frameworks similar to RSS for discoverability and interoperability. Establishing such standards could improve how AI agents are discovered, shared, and integrated across platforms.

RSS