Cainew - Curated AI news for developers

TL;DR

Model Releases

Tools & Products

Research Papers

Industry News

Discussion

Model Releases

Gemini 3.5 Flash: frontier intelligence with action

Google's Gemini 3.5 Flash model combines frontier-level intelligence with the ability to take actions, offering faster performance while maintaining advanced reasoning capabilities.

RSS

I/O 2026: Welcome to the agentic Gemini era

The latest from Google I/O: See how we’re helping you get more done with Gemini.

RSS

Tools & Products

cosmicstack-labs/mercury-agent

Soul-driven AI agent with permission-hardened tools, token budgets, and multi-channel access. Runs 24/7 from CLI or Telegram.

GitHub

crafter-station/petdex

The public gallery of animated pet for Codex, Claude Code, OpenCode y Gemini CLI

GitHub

raindrop-ai/workshop

Give your coding agent the power to write and run agent evals.

GitHub

KevRojo/Dulus

The only real free CLI agent. Harvests your Gemini (guest, no login) · Claude.ai · Claude Code · Kimi · Qwen · DeepSeek browser session and turns it into a tool-calling agent — reads & edits files, runs Bash, greps your repo, browses the web, ships commits, all from your terminal. Frontier IA driving real work at $0

GitHub

sapientinc/HRM-Text

HRM-Text is a 1B text generation model based on the HRM architecture, strengthened by task completion and latent space reasoning.

GitHub

baidu-baige/LoongForge

A modular, scalable, high-performance training framework for LLMs, VLMs, diffusion, and embodied models.

GitHub

PollyReach: Give your agent a real number and voice to make calls.

Most AI phone tools are built for enterprises — APIs, workflows, sales automation. PollyReach is built for you. Give your AI a real phone number. Say "book me a table for 7pm" — it finds the number, makes the call, handles the conversation, and reports back with a summary, recording & transcript. It also answers your phone 24/7 and screens spam. Works in 50+ languages.

ProductHunt

Nutlope/hallmark

Anti-AI-slop design skill for Claude Code, Cursor, and Codex.

GitHub

Drizz: Mobile tests that write, run, and fix themselves

Drizz is an AI-powered mobile test automation platform built around intent-based testing. Simply describe what you want to test in plain English, Drizz executes it on a real device using Vision AI and automatically authors a reusable test case. No scripting, no flaky selectors, no manual maintenance. It adapts to dynamic UIs, integrates with your CI/CD pipeline, and gives your team reliable end-to-end coverage without the overhead.

ProductHunt

Composer 2.5: Cursor’s most powerful model yet

A substantial improvement in intelligence and behavior over Composer 2, particularly on long-horizon agentic tasks.

ProductHunt

kruschdev/krusch-context-mcp

A unified Zero-Trust MCP server that gives IDE agents local semantic codebase search, isolated episodic project memory, and hallucination-free framework RAG.

GitHub

CtrlOps: Deploy, Debug & Manage Linux Servers with AI.

Most devs manage servers from a spreadsheet of IPs and commands nobody remembers. CtrlOps gives you AI-powered server management without DevOps expertise. AI terminal that generates commands with your approval. Scripts library. One-click deploys from any GitHub repo. Visual file manager. Real-time server monitoring. Zero agents on servers. Deployments that took 60 minutes now take 5. 100% local. Your credentials never leave your machine. Mac. Windows. Linux.

ProductHunt

Episkey-G/GrokSearch-rs

Rust MCP server for Grok web search and Tavily-backed source retrieval

GitHub

Chert: Build AI agents that text customers in iMessage

Build and deploy conversational iMessage agents for customer service, inbound lead capture, and more. Simply configure the system prompt and tone, and you can create your own conversational iMessage agent for inbound handling, outbound follow-up, or whatever workflow you want to test. You can also integrate with CRMs like HubSpot, Close, or GoHighLevel to write back conversation histories.

ProductHunt

Motion: A video agent for tasteful motion design

Motion is a frontier video agent for tasteful motion design. Give it a prompt with links, X threads, videos, assets, or references. Motion researches, storyboards, and creates explainers, launch videos, logo animations, or motion design for existing videos. Then edit everything directly: resize, drag and drop, modify elements, or iterate with chat.

ProductHunt

Research Papers

Gaussian Splat of a Strawberry

A new technique called Gaussian Splat enables high-quality 3D reconstruction and visualization of objects like strawberries with improved rendering efficiency compared to traditional methods.

RSS

What political censorship looks like inside an LLM's weights (Qwen 3.5)

Researchers found evidence that Qwen 3.5 LLM contains political censorship embedded in its model weights, revealing how biases and content restrictions are baked into AI systems at a fundamental level.

RSS

Advancing content provenance for a safer, more transparent AI ecosystem

OpenAI advances AI content provenance with Content Credentials, SynthID, and a verification tool to help people identify and trust AI-generated media.

OpenAI

Post-Trained MoE Can Skip Half Experts via Self-Distillation

Mixture-of-Experts (MoE) scales language models efficiently through sparse expert activation, and its dynamic variant further reduces computation by adjusting the activated experts in an input-dependent manner. Existing dynamic MoE methods usually rely on pre-training from scratch or task-specific adaptation, leaving the practical conversion of fully trained MoE underexplored. Enabling such adaptation would directly alleviate the inference costs by allowing easy tokens to bypass unnecessary expe...

HuggingFace

Lance: Unified Multimodal Modeling by Multi-Task Synergy

We present Lance, a lightweight native unified model supporting multimodal understanding, generation, and editing for both images and videos. Rather than relying on model capacity scaling or text-image-dominant designs, Lance explores a practical paradigm for unified multimodal modeling via collaborative multi-task training. It is grounded in two core principles: unified context modeling and decoupled capability pathways. Specifically, Lance is trained from scratch and employs a dual-stream mixt...

HuggingFace

DexHoldem: Playing Texas Hold'em with Dexterous Embodied System

Evaluating embodied systems on real dexterous hardware requires more than isolated primitive skills: an agent must perceive a changing tabletop scene, choose a context-appropriate action, execute it with a dexterous hand, and leave the scene usable for later decisions. We introduce DexHoldem, a real-world system-level benchmark built around Texas Hold'em dexterous manipulation with a ShadowHand. DexHoldem provides 1,470 teleoperated demonstrations across 14 Texas Hold'em manipulation primitives,...

HuggingFace

AtlasVA: Self-Evolving Visual Skill Memory for Teacher-Free VLM Agents

Vision-language model (VLM) agents increasingly rely on memory-augmented reinforcement learning to reuse experience across long-horizon tasks, yet most existing frameworks store memory as text and depend on proprietary teacher models to summarize or refine it. This design is poorly matched to spatial decision making: geometric priors are compressed into lossy language, and sparse interaction is often supervised through delayed textual feedback rather than dense visually grounded signals. We argu...

HuggingFace

MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents

Recent GUI agents have made substantial progress in visual grounding and action prediction, yet they remain brittle in long-horizon tasks that require maintaining task state across many interface transitions. Existing agents typically rely on raw history replay or text-only memory, which either overwhelms the model with redundant screenshots or discards localized visual evidence needed for future decisions. To address these limitations, we introduce MementoGUI, a plug-in agentic memory framework...

HuggingFace

Monitoring the Internal Monologue: Probe Trajectories Reveal Reasoning Dynamics

Large Reasoning Models (LRMs) introduce new opportunities for safety monitoring through their Chain of Thought (CoT) reasoning. However, CoT is not always faithful to the model's final output, undermining its reliability as a monitoring tool. To address this, we investigate the hidden representations of LRMs to determine whether future behavior can be predicted from prompt and CoT representations. By evaluating a probe at each generated token, we construct a probe trajectory, the continuous evol...

HuggingFace

AI for Auto-Research: Roadmap & User Guide

AI-assisted research is crossing a threshold: fully automated systems can now generate research papers for as little as $15, while long-horizon agents can execute experiments, draft manuscripts, and simulate critique with minimal human input. Yet this productivity frontier exposes a deeper integrity problem: under scientific pressure, even frontier LLMs still fabricate results, miss hidden errors, and fail to judge novelty reliably. Studying developments through April 2026, we present an end-to-...

HuggingFace

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

We present LongLive-2.0, an NVFP4-based parallel infrastructure throughout the full training and inference workflow of long video generation, addressing speed and memory bottlenecks. For training, we introduce sequence-parallel autoregressive (AR) training, instantiated as Balanced SP, which co-designs the efficient teacher-forcing layout with SP execution by pairing clean-history and noisy-target temporal chunks on each rank, enabling a natural teacher-forcing mask with SP-aware chunked VAE enc...

HuggingFace

SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training

Diffusion models have been widely studied for removing unsafe content learned during pre-training. Existing methods require expensive supervised data, either unsafe-text paired with safe-image groundtruth or negative/positive image pairs, making them impractical to scale. Furthermore, offline reinforcement learning and supervised fine-tuning approaches that generate synthetic data offline suffer from catastrophic forgetting, degrading generation quality. We propose a novel online reinforcement l...

HuggingFace

Incantation: Natural Language as the Action Interface for Multi-Entity Video World Models

Modern interactive video world models have achieved impressive visual fidelity, yet lack fine-grained multi-entity control and cross-entity, cross-world generalization. We trace this gap to the action interface: standard control protocols (e.g. animation IDs, device inputs, scene-level captions) bind action semantics to specific entities or engines at design time. We propose natural language as the interface to unlock expressiveness that no prior interface can achieve, and we present Incantation...

HuggingFace

SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution

Long-horizon LLM agents leave traces that could become reusable experience, but raw trajectories are noisy and hard to govern. We treat Agent Skills as an experience schema that couples executable scripts, with non-executable guidance on procedures. Yet open skill ecosystems contain redundant, uneven, environment-sensitive artifacts, and indiscriminate updates can pollute future context. We present SkillsVote, a lifecycle-governance framework for Agent Skills from collection and recommendation t...

HuggingFace

Code as Agent Harness

Recent large language models (LLMs) have demonstrated strong capabilities in understanding and generating code, from competitive programming to repository-level software engineering. In emerging agentic systems, code is no longer only a target output. It increasingly serves as an operational substrate for agent reasoning, acting, environment modeling, and execution-based verification. We frame this shift through the lens of agent harnesses and introduce code as agent harness: a unified view that...

HuggingFace

Industry News

Anthropic Is Preparing for IPO and We Should Be Worried

Anthropic is preparing for an IPO, which raises concerns about the company's future direction and potential impacts on AI safety priorities that may be deprioritized in favor of investor returns.

RSS

KPMG integrates Claude across its core business and workforce of more than 276,000 in strategic alliance

Anthropic

Discussion

The last six months in LLMs in five minutes

This summary captures the major developments and breakthroughs in large language models over the past six months in a concise overview.

RSS

Alignment pretraining: AI discourse creates self-fulfilling (mis)alignment

This article examines how alignment pretraining discussions in AI discourse can inadvertently create self-fulfilling prophecies of misalignment within AI systems.

ArXiv

Apple Silicon costs less than OpenRouter

Running inference on Apple Silicon hardware is significantly more cost-effective than using OpenRouter's API services for certain use cases.

Twitter