Cainew

Curated AI news for developers

TL;DR

Model Releases

Tools & Products

Soul-driven AI agent with permission-hardened tools, token budgets, and multi-channel access. Runs 24/7 from CLI or Telegram.

GitHub

The public gallery of animated pet for Codex, Claude Code, OpenCode y Gemini CLI

GitHub

The only real free CLI agent. Harvests your Gemini (guest, no login) · Claude.ai · Claude Code · Kimi · Qwen · DeepSeek browser session and turns it into a tool-calling agent — reads & edits files, runs Bash, greps your repo, browses the web, ships commits, all from your terminal. Frontier IA driving real work at $0

GitHub

HRM-Text is a 1B text generation model based on the HRM architecture, strengthened by task completion and latent space reasoning.

GitHub

A modular, scalable, high-performance training framework for LLMs, VLMs, diffusion, and embodied models.

GitHub

Most AI phone tools are built for enterprises — APIs, workflows, sales automation. PollyReach is built for you. Give your AI a real phone number. Say "book me a table for 7pm" — it finds the number, makes the call, handles the conversation, and reports back with a summary, recording & transcript. It also answers your phone 24/7 and screens spam. Works in 50+ languages.

ProductHunt

Anti-AI-slop design skill for Claude Code, Cursor, and Codex.

GitHub

Drizz is an AI-powered mobile test automation platform built around intent-based testing. Simply describe what you want to test in plain English, Drizz executes it on a real device using Vision AI and automatically authors a reusable test case. No scripting, no flaky selectors, no manual maintenance. It adapts to dynamic UIs, integrates with your CI/CD pipeline, and gives your team reliable end-to-end coverage without the overhead.

ProductHunt

A unified Zero-Trust MCP server that gives IDE agents local semantic codebase search, isolated episodic project memory, and hallucination-free framework RAG.

GitHub

Most devs manage servers from a spreadsheet of IPs and commands nobody remembers. CtrlOps gives you AI-powered server management without DevOps expertise. AI terminal that generates commands with your approval. Scripts library. One-click deploys from any GitHub repo. Visual file manager. Real-time server monitoring. Zero agents on servers. Deployments that took 60 minutes now take 5. 100% local. Your credentials never leave your machine. Mac. Windows. Linux.

ProductHunt

Build and deploy conversational iMessage agents for customer service, inbound lead capture, and more. Simply configure the system prompt and tone, and you can create your own conversational iMessage agent for inbound handling, outbound follow-up, or whatever workflow you want to test. You can also integrate with CRMs like HubSpot, Close, or GoHighLevel to write back conversation histories.

ProductHunt

Motion is a frontier video agent for tasteful motion design. Give it a prompt with links, X threads, videos, assets, or references. Motion researches, storyboards, and creates explainers, launch videos, logo animations, or motion design for existing videos. Then edit everything directly: resize, drag and drop, modify elements, or iterate with chat.

ProductHunt

Research Papers

A new technique called Gaussian Splat enables high-quality 3D reconstruction and visualization of objects like strawberries with improved rendering efficiency compared to traditional methods.

RSS

Mixture-of-Experts (MoE) scales language models efficiently through sparse expert activation, and its dynamic variant further reduces computation by adjusting the activated experts in an input-dependent manner. Existing dynamic MoE methods usually rely on pre-training from scratch or task-specific adaptation, leaving the practical conversion of fully trained MoE underexplored. Enabling such adaptation would directly alleviate the inference costs by allowing easy tokens to bypass unnecessary expe...

HuggingFace

We present Lance, a lightweight native unified model supporting multimodal understanding, generation, and editing for both images and videos. Rather than relying on model capacity scaling or text-image-dominant designs, Lance explores a practical paradigm for unified multimodal modeling via collaborative multi-task training. It is grounded in two core principles: unified context modeling and decoupled capability pathways. Specifically, Lance is trained from scratch and employs a dual-stream mixt...

HuggingFace

Evaluating embodied systems on real dexterous hardware requires more than isolated primitive skills: an agent must perceive a changing tabletop scene, choose a context-appropriate action, execute it with a dexterous hand, and leave the scene usable for later decisions. We introduce DexHoldem, a real-world system-level benchmark built around Texas Hold'em dexterous manipulation with a ShadowHand. DexHoldem provides 1,470 teleoperated demonstrations across 14 Texas Hold'em manipulation primitives,...

HuggingFace

Vision-language model (VLM) agents increasingly rely on memory-augmented reinforcement learning to reuse experience across long-horizon tasks, yet most existing frameworks store memory as text and depend on proprietary teacher models to summarize or refine it. This design is poorly matched to spatial decision making: geometric priors are compressed into lossy language, and sparse interaction is often supervised through delayed textual feedback rather than dense visually grounded signals. We argu...

HuggingFace

Recent GUI agents have made substantial progress in visual grounding and action prediction, yet they remain brittle in long-horizon tasks that require maintaining task state across many interface transitions. Existing agents typically rely on raw history replay or text-only memory, which either overwhelms the model with redundant screenshots or discards localized visual evidence needed for future decisions. To address these limitations, we introduce MementoGUI, a plug-in agentic memory framework...

HuggingFace

Large Reasoning Models (LRMs) introduce new opportunities for safety monitoring through their Chain of Thought (CoT) reasoning. However, CoT is not always faithful to the model's final output, undermining its reliability as a monitoring tool. To address this, we investigate the hidden representations of LRMs to determine whether future behavior can be predicted from prompt and CoT representations. By evaluating a probe at each generated token, we construct a probe trajectory, the continuous evol...

HuggingFace

AI-assisted research is crossing a threshold: fully automated systems can now generate research papers for as little as $15, while long-horizon agents can execute experiments, draft manuscripts, and simulate critique with minimal human input. Yet this productivity frontier exposes a deeper integrity problem: under scientific pressure, even frontier LLMs still fabricate results, miss hidden errors, and fail to judge novelty reliably. Studying developments through April 2026, we present an end-to-...

HuggingFace

We present LongLive-2.0, an NVFP4-based parallel infrastructure throughout the full training and inference workflow of long video generation, addressing speed and memory bottlenecks. For training, we introduce sequence-parallel autoregressive (AR) training, instantiated as Balanced SP, which co-designs the efficient teacher-forcing layout with SP execution by pairing clean-history and noisy-target temporal chunks on each rank, enabling a natural teacher-forcing mask with SP-aware chunked VAE enc...

HuggingFace

Diffusion models have been widely studied for removing unsafe content learned during pre-training. Existing methods require expensive supervised data, either unsafe-text paired with safe-image groundtruth or negative/positive image pairs, making them impractical to scale. Furthermore, offline reinforcement learning and supervised fine-tuning approaches that generate synthetic data offline suffer from catastrophic forgetting, degrading generation quality. We propose a novel online reinforcement l...

HuggingFace

Modern interactive video world models have achieved impressive visual fidelity, yet lack fine-grained multi-entity control and cross-entity, cross-world generalization. We trace this gap to the action interface: standard control protocols (e.g. animation IDs, device inputs, scene-level captions) bind action semantics to specific entities or engines at design time. We propose natural language as the interface to unlock expressiveness that no prior interface can achieve, and we present Incantation...

HuggingFace

Long-horizon LLM agents leave traces that could become reusable experience, but raw trajectories are noisy and hard to govern. We treat Agent Skills as an experience schema that couples executable scripts, with non-executable guidance on procedures. Yet open skill ecosystems contain redundant, uneven, environment-sensitive artifacts, and indiscriminate updates can pollute future context. We present SkillsVote, a lifecycle-governance framework for Agent Skills from collection and recommendation t...

HuggingFace

Recent large language models (LLMs) have demonstrated strong capabilities in understanding and generating code, from competitive programming to repository-level software engineering. In emerging agentic systems, code is no longer only a target output. It increasingly serves as an operational substrate for agent reasoning, acting, environment modeling, and execution-based verification. We frame this shift through the lens of agent harnesses and introduce code as agent harness: a unified view that...

HuggingFace

Industry News

Discussion