Cainew - Curated AI news for developers

TL;DR

Model Releases

Tools & Products

Research Papers

Industry News

Discussion

Model Releases

GLM-5.2 is the new leading open weights model on Artificial Analysis

GLM-5.2 has achieved the top ranking among open-weight models on Artificial Analysis, demonstrating strong performance in AI benchmarking metrics.

RSS

DeepSeek V4 Pro at 5% the cost of Claude – what it takes to close the gap

DeepSeek V4 Pro offers significantly lower costs compared to Claude while maintaining competitive performance, demonstrating how improved efficiency and optimization can narrow the gap between different AI models. The achievement highlights the importance of cost-effective approaches in making advanced AI more accessible.

RSS

Tools & Products

superloglabs/superlog

Open-source observability tool that uses AI agents to self-heal your software

GitHub

jianshuo/ccglass

See what your coding agent (Claude Code, Codex, Kimi) sends to the model — local proxy + web dashboard

GitHub

Liu-Ming-Yu/alpha-forge

Alpha Forge — an agentic AI operating system for systematic trading.

GitHub

agentic-in/inferoa

Inference-native Tokenmaxxing Agent Harness for Loop Engineering

GitHub

Nigh/show-me-the-story

Self-hosted AI novel generator: single Go binary + web UI. OpenAI-compatible API → outline → chapter-by-chapter writing with review, foreshadowing, fact-check, and full-book polish. Chinese & English.

GitHub

Sibyl-Labs/Sibyl-Memory

Sibyl Memory Plugin for Hermes enables persistent memory across long time horizons, and enables relational context previously unavailable. Self-learning and auto-skill creation creates an agent that grows with you. Local SQLite, structured tiers, no vector DB. SDK, CLI, MCP server, Hermes plugin.

GitHub

Framer 3.0: With Agents, Branching, Community, and an all-new design

Agents bring AI to the canvas to help you design, write, analyze, and organize your sites. We’re also launching Branching, a new way for teams to explore ideas before they go live, and unveiling the new Framer Community, where creators can share and earn. Together, these launches change how teams create, maintain, and scale websites. We think you’ll love it.

ProductHunt

believer-oss/Claireon

MCP server for Unreal Editor

GitHub

ForestHubAI/edge-agents

The 30 MB open-source edge AI agent runtime. Run AI agents offline on Linux (Raspberry Pi, Jetson). GPIO, UART, MQTT as first-class nodes. Industrial protocols (OPC-UA, Modbus) on the roadmap.

GitHub

ChenJazzyBoss/superSpec

AI-native spec management for Claude Code

GitHub

Show HN: High-Res Neural Cellular Automata

High-Res Neural Cellular Automata is a novel neural network approach that generates high-resolution outputs using cellular automata principles for improved visual quality.

RSS

gammahazard/locate-anything

Sleek, mobile-friendly web UI for NVIDIA LocateAnything-3B — open-vocabulary object detection & grounding on your own GPU, via one docker compose up.

GitHub

Swytchcode CLI: Give agents reliable access to 2,000+ APIs w/ durable state

Write agent logic, and skip the plumbing. Give AI agents reliable access to 2,000+ APIs with retries, idempotency, policy enforcement, and durable state.

ProductHunt

llamastash/llamastash

A fast terminal native app (TUI) and CLI with init wizard for launching local LLMs with zero overhead

GitHub

paxlabs-inc/matrix-core

Matrix is the cognition and UX layer on top of Paxeer Network. It turns natural-language requests from non-developers into a typed, inspectable, correctable Intent IR

GitHub

Research Papers

Semiclassical Gravity Efficiently Solves NP-Complete Problems

Semiclassical gravity theory offers an efficient computational approach to solving NP-complete problems, suggesting novel methods for addressing computationally difficult challenges.

ArXiv

New research shows how AMIE, our medical AI, could help manage health conditions.

Research in “Nature” shows our conversational AI system matches primary care physicians in complex disease management.

RSS

A near-autonomous AI chemist improves a challenging reaction in medicinal chemistry

OpenAI and Molecule.one show how a near-autonomous AI chemist using GPT-5.4 improved a key drug-making reaction, advancing medicinal chemistry research.

OpenAI

ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions

Large language models perform increasingly well on standardized logical reasoning benchmarks, but whether this ability remains robust beyond English is unclear. We introduce ChLogic, an English--Chinese aligned benchmark that tests whether models preserve logical reasoning performance when the same latent logical structure is expressed in English and diverse Chinese surface realizations. Built from formal logical templates, the benchmark contains three data sets: (i) the General aligned set, der...

HuggingFace

Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification

Unified Multimodal Modeling aims to integrate visual understanding and generation within a single system. However, existing approaches typically rely on two disparate visual tokenizers, which splits the representation space and hinders truly unified modeling. We propose UniAR, a unified autoregressive framework where a single discrete visual tokenizer serves as the key bridge between understanding and generation, enabling a shared context in which the model can directly interpret its own generat...

HuggingFace

Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion

Pixel-space diffusion models are trained on full-bandwidth noisy images, yet the useful signal available to the denoiser is strongly frequency dependent. Under rectified-flow diffusion and natural-image power-law spectra, the per-band data-to-noise contour k^{*}(t) = (1-t)^{-2/α} separates a signal-bearing low-frequency region from a noise-dominated high-frequency region at each time t. We show that this implicit coarse-to-fine structure is not merely descriptive: it induces a capacity-allocati...

HuggingFace

Looped World Models

Current world models face a fundamental tension: faithful long-horizon simulation demands deep computation, but deeper models are expensive to deploy and prone to compounding errors. We resolve this by introducing Looped World Models (LoopWM), which are the first looped architectures for world modelling. Our method iteratively refines latent environment states through a parameter-shared transformer block. This yield up to 100x parameter efficiency over conventional approaches with adaptive compu...

HuggingFace

ActWorld: From Explorable to Interactive World Model via Action-Aware Memory

Interactive world models aim to simulate environment dynamics under real-time user actions. However, their action vocabulary is largely confined to navigation: most actions correspond to motion (e.g., walk, turn, look around), while interaction with objects in the scene (e.g., pick up plates, open doors, or trigger physical responses) is either absent, restricted to game domains, or relegated to prompt-to-full-video scenarios. The resulting worlds are visually explorable but not truly actionable...

HuggingFace

Variable-Width Transformers

Scaling model size, specifically depth and width, has driven significant progress in transformer-based language models. However, most architectures maintain a constant width across all layers, allocating a fixed parameter and computation budget evenly despite different layers potentially playing distinct computational roles. In this work, we empirically investigate nonuniform capacity allocation across network depth by proposing a times-shaped >

HuggingFace

EgoCS-400K: An Egocentric Gameplay Dataset for World Models

The shift from video generation to interactive world modeling places new demands on data: beyond captioned videos, world models require temporally aligned video-action-language trajectories grounded in the actions, camera motion, states, and events that drive future scene changes. However, such data is difficult to obtain at scale. Web video datasets offer broad visual coverage but lack executable actions and reliable states; robotic datasets provide action and state supervision but are costly a...

HuggingFace

OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation

Memory has become a standard substrate for self-evolving agents, yet retaining experience is not the same as learning how to evolve through it. Existing memory agents can store trajectories, retrieve reflections, or accumulate skills, but often lack the holistic competence to select useful experience, act on it, write reusable knowledge, and maintain a growing repository. We introduce OPD-Evolver, a slow-fast co-evolution framework that cultivates such an agent evolver through on-policy self-dis...

HuggingFace

Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients

Knowledge distillation transfers a teacher's competence to a small student but is brittle in the small-student regime: forcing the student to imitate logits from a much larger teacher concentrates it on the teacher's sharpest modes, hurting generalization on benchmark families beyond the training corpus. Reinforcement learning (RL) avoids logit imitation by training on the student's own rollouts. However, on questions where every rollout fails-yielding zero advantage and being silently discarded...

HuggingFace

GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?

Game generation is an emerging application of coding agents, requiring models to transform natural-language specifications into playable interactive systems. Unlike traditional coding tasks, game generation takes place within a game engine, where scripts, scenes, assets, rendering, and runtime interactions must jointly produce coherent gameplay. We formalize end-to-end game generation as the problem of producing a complete game artifact that realizes a specification through observable player-gam...

HuggingFace

LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling

Looped Transformers scale latent computation by repeatedly applying shared blocks, but sequential looping increases latency and KV-cache memory with the loop count. Parallel loop Transformers (PLT) alleviate this cost through cross-loop position offsets (CLP) and shared-KV gated sliding-window attention, making loop count a practical design choice. We therefore study PLT loop-count selection through a gain--cost view: an extra loop may refine representations, but CLP also introduces a positional...

HuggingFace

Learning from the Self-future: On-policy Self-distillation for dLLMs

On-policy self-distillation (OPSD) has proven effective for post-training large language models (LLMs), yet its application to diffusion LLMs (dLLMs) remains unexplored. Existing OPSD methods are inherently autoregressive-centric. They inject privileged information via left-to-right prefix conditioning with token-level divergence supervision, a design that fundamentally conflicts with the arbitraryorder generation of dLLMs. We introduce d-OPSD, the first OPSD framework tailored for dLLMs. Our ap...

HuggingFace

Industry News

US holds off blacklisting DeepSeek, more than 100 firms deemed security risks

The US government has decided not to blacklist DeepSeek while identifying over 100 companies as potential security risks, reflecting ongoing concerns about AI technology regulation.

RSS

DOJ claims xAI's gas turbines are a matter of 'national and energy security'

The Department of Justice has filed claims arguing that xAI's gas turbine infrastructure is critical to national and energy security interests. This legal action underscores the growing intersection between AI infrastructure development and government regulatory concerns.

RSS

Pentagon boasts of using AI to write reports mandated by Congress (1.5mil users)

The Pentagon is using AI to generate reports required by Congress, now serving 1.5 million users and showcasing the adoption of AI in government operations.

RSS

Anthropic opens Seoul office and announces new partnerships across the Korean AI ecosystem

Anthropic

Discussion

AI demands more engineering discipline. Not less

The AI industry must prioritize engineering discipline and rigorous practices rather than moving faster with less oversight to build reliable and robust systems.

RSS

The founder's playbook: Building an AI-native startup

A guide outlining the key strategies and best practices for founders building startups that are natively designed around AI capabilities from the ground up.

RSS