Cainew - Curated AI news for developers

TL;DR

Model Releases

Tools & Products

Research Papers

Industry News

Microsoft's open source tools were hacked to steal passwords of AI developers

Model Releases

Claude Fable 5

Anthropic

Apple reveals new AI architecture built around Google Gemini models

Apple has unveiled a new AI architecture that is built around and integrated with Google's Gemini models.

RSS

silx-ai/Quasar-Preview

Quasar-Preview is a new AI model release from silx-ai that offers enhanced capabilities for various tasks, representing an advancement in the organization's model offerings. The preview version allows early access and feedback for further development and refinement.

HuggingFace

Tools & Products

gi-dellav/zerostack

Minimal coding agent written in Rust, optimized for memory footprint and performance

GitHub

JimLiu/baoyu-design

Run Claude Design locally as an Agent Skill — Cursor, Claude Code & more. Produce polished UI mockups, prototypes, decks & wireframes as self-contained HTML, without claude.ai/design. Best with Opus 4.8.

GitHub

superloglabs/superlog

Open-source observability tool that uses AI agents to self-heal your software

GitHub

caezium/Burrow

A free, open-source mole.fit — a native macOS GUI for the Mole CLI (mo): clean, uninstall, optimize, analyze disk, and watch live status. Plus long-range history + an MCP server for AI agents

GitHub

xichan96/dinotty

Mobile-first web terminal for AI coding agents (Claude Code, Codex, OpenCode). Server-side virtual terminal enables session persistence & auto-reconnect — disconnect, sleep, refresh, and pick up exactly where you left off. With customizable shortcut keyboard, file workspace & live web preview.

GitHub

ferdinandobons/brand-docs

BrandDocs is a set of agent skills that learn your existing Word, PowerPoint and Excel templates and generate new on-brand documents from them. Unlike generic AI document generators, it preserves brand, structure, styles and formulas by construction. Built for Claude Code, Codex and compatible AI agents.

GitHub

VC Boom: Score your deck, meet investors who fit, raise more. Boom!

VC Boom scores your pitch deck in under 90 seconds and tells you the single fastest fix, matches you with the right investors from 47,000+ (each with a one-line reason they fit), then drafts personalized cold emails you send from your own inbox. Prep for each investor, then book the calls. Founders using VC Boom have already raised $95M. Built by an 8-year VC who raised hundreds of millions and deployed across 47 startups. Free to start, no subscription.

ProductHunt

Sibyl-Labs/Sibyl-Memory

Sibyl Memory Plugin for Hermes enables persistent memory across long time horizons, and enables relational context previously unavailable. Self-learning and auto-skill creation creates an agent that grows with you. Local SQLite, structured tiers, no vector DB. SDK, CLI, MCP server, Hermes plugin.

GitHub

ForestHubAI/edge-agents

The 30 MB open-source AI agent runtime for edge devices. Offline by default — GPIO, UART, MQTT as first-class nodes. Industrial protocols (OPC-UA, Modbus) on the roadmap.

GitHub

ZeroGPU: The compute efficient layer for AI inference

The world can't build compute fast enough to keep up with AI demand. So we took a different path. ZeroGPU is AI infrastructure powered by small language models running on a hybrid edge network reusing compute that already exists. Not every task needs a frontier model. Our purpose-built, edge-optimized models run 10x faster, 50% cheaper and offload 70–80% of production tasks to small models with frontier-level accuracy.

ProductHunt

agentscope-ai/PawBench

A benchmark for evaluating LLM × harness performance.

GitHub

Kimi Work: The AI desktop for knowledge work

Kimi Work is a desktop agent for knowledge work. It connects to local files, uses WebBridge for browser automation, runs scheduled tasks, coordinates agent swarms, creates PPT/Excel/Word/PDF outputs, and includes native finance data tools.

ProductHunt

Uiverse Design: De-slop your AI generated websites

You can tell when an app was vibecoded. So can your users. That generic purple gradient, the pills and badges, emojis everywhere, it all screams: "an AI made this in 20 minutes." Uiverse Design is a library of AI-first design systems you can drop into any project. Each one defines real typography, spacing, color, images and component treatment. All of them ship with a DESIGN.md instructions file, so that your agent knows exactly how to use it. You just sit back, and watch your app transform.

ProductHunt

TravelMind : AI-powered city discovery built on taste, not reviews

You land in a new city. You open every app you know. Two hours later you're still scrolling, still unsure, still guessing. TravelMind was built for that moment. Swipe through places, tell us what you love — the AI does the rest. It learns your taste and finds the right spot before you even know to look. Your taste. Every city. Live now on iOS and Android.

ProductHunt

Whistle: A fitness coach with personalized plans

Most workout apps give you a generic plan and call it personalized. Whistle actually knows you. It reads your Apple Health data and builds a real training plan around your fitness level, recovery, and goals. Detailed workouts, smart progression, all on your iPhone and Apple Watch. Whether you're just getting started or pushing toward a new personal best, your AI coach figures out what you need and when you need it. Your data. Your plan. Your pace.

ProductHunt

Research Papers

Is Grep All You Need? How Agent Harnesses Reshape Agentic Search

This article explores how agent harnesses and grep-based techniques are reshaping agentic search methodologies in AI systems.

ArXiv

Can LLMs Beat Classical Hyperparameter Optimization Algorithms?

This research investigates whether large language models can match or outperform traditional classical hyperparameter optimization algorithms.

ArXiv

SwiftVR: Real-Time One-Step Generative Video Restoration

Real-time video restoration (VR) for live streams requires high-resolution outputs under strict per-frame latency constraints. Existing one-step diffusion-based VR models remain difficult to deploy on consumer-grade GPUs due to two main bottlenecks: quadratic spatial attention at high resolutions and the latency-memory overhead of large video autoencoders. We present SwiftVR, a streaming one-step generative VR framework that reduces both bottlenecks under a causal chunk-wise protocol. For attent...

HuggingFace

OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

Vision-language model (VLM) agents are increasingly deployed in interactive game environments. Yet game benchmarks for VLM agents typically report a single first-attempt score per (agent, game) pair, focus on single-agent Solo play, and lack unified protocols for evaluating heterogeneous agent classes (commercial VLMs, open-weight VLMs, and specialized game policies) on the same footing. We address these gaps with OmniGameArena, a real-time benchmark of twelve newly built Unreal Engine 5 games s...

HuggingFace

Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory

Medical agent systems are increasingly expected to support interactive clinical decision making rather than only static question answering. In such settings, effective agents must reuse prior experience across evolving cases, yet existing memory mechanisms often retain raw historical traces that are redundant, noisy, and difficult to govern. More importantly, they rarely distinguish which memories are truly useful for future reasoning. This limits their ability to accumulate compact and reliable...

HuggingFace

AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing

World-action models have emerged as a promising paradigm for robot manipulation, jointly modeling visual scene dynamics and actions to inject physical priors into policy learning. However, existing world-action models couple world prediction and action execution at the same temporal resolution, forcing the world branch to model near-term frame variations that are redundant and weakly informative. We posit that strictly binding world prediction and action execution to the same temporal rhythm may...

HuggingFace

PBSD: Privileged Bayesian Self-Distillation for Long-Horizon Credit Assignment

Long-horizon agentic tasks pose a fundamental credit assignment challenge for outcome-base reinforcement learning: trajectory-level rewards verify final correctness but provide limited guidance on which intermediate reasoning steps or tool interactions contribute to the outcome. The difficulty is especially pronounced in multi-turn search agents, where successful trajectories may contain misleading actions and failed trajectories may contain valuable evidence-gathering steps. We propose PBSD (Pr...

HuggingFace

Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops

Agent benchmarks score submissions with outcome verifiers that are typically hand-written and brittle, leaving them open to reward hacking. We audit 1,968 tasks across five terminal-agent benchmarks and find 323 (16%) hackable by frontier models given only the task description. This corrupts both leaderboard rankings and RL training signal, yet the standard response is manual and reactive. We introduce the hacker-fixer loop, a method for building exploit-resistant verifiers without per-task ma...

HuggingFace

End-to-End Context Compression at Scale

Long-context language model inference is bottlenecked by memory, as the KV cache grows with context length. Recent techniques to compress the KV cache fall short: they either degrade model quality substantially or require considerable time and compute to compress a single long prompt. Furthermore, many methods require the input to fit within the target model's context window, and are generally incompatible with modern production inference engines. Encoder-decoder compressors, which map a long to...

HuggingFace

Optical Reasoning: Rethinking Images as an Expressive Reasoning Medium Beyond Text

Chain-of-Thought (CoT) improves the performance of Large Language Models (LLMs) and has been extended to Multimodal Large Language Models (MLLMs). More recent work further moves from text-based multimodal reasoning toward interleaved-modal reasoning, where intermediate steps can incorporate both textual rationales and visual evidence. In this work, we propose a bolder and more ambitious idea: could images alone serve as the reasoning medium for both language and multimodal tasks? To explore this...

HuggingFace

Reasoning Arena: Trace Tournaments When Verifiable Rewards Fall Short

Reinforcement learning with verifiable rewards (RLVR) has become a leading paradigm for improving the reasoning ability of large language models through outcome-based supervision. However, verifiable rewards frequently become uninformative at the group level: when all sampled traces of a given prompt receive identical rewards, group-relative advantage estimation provides no gradient signal, even though the traces may differ substantially in reasoning quality. We propose Reasoning Arena, an adapt...

HuggingFace

Echo-Memory: A Controlled Study of Memory in Action World Models

We present Echo-Memory, a controlled study of memory mechanisms in action-conditioned world models. These models generate multi-segment videos from a first frame, text prompt, and camera-action sequence, but their central failure is often memory rather than local image synthesis: after the camera leaves and returns, the scene or salient object may silently change. Existing memory designs are hard to compare because gains are entangled with backbone, training, retrieval, and evaluation difference...

HuggingFace

FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention

Conventional LLMs keep the full KV cache loaded during decoding, causing a severe GPU memory bottleneck for ultra-long context serving. In this report, we propose Lookahead Sparse Attention (LSA), a novel inference paradigm powered by a Neural Memory Indexer built upon the DeepSeek-V4 architecture. Rather than passively attending to all historical tokens, LSA proactively predicts future context demands and preserves only the query-critical KV chunks in the GPU memory. Crucially, we instantiate t...

HuggingFace

Latent Spatial Memory for Video World Models

Video world models that maintain 3D spatial consistency across generated frames typically rely on explicit point cloud memory constructed in RGB space. This design is both computationally expensive, requiring repeated rendering and VAE encoding, and inherently lossy, as the round trip through pixel space discards rich features of the learned latent representation. In this paper, we introduce latent spatial memory for video world models, a persistent 3D cache that stores scene information directl...

HuggingFace

Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting

AI evaluation results are produced at scale but reported inconsistently across leaderboards, model cards, benchmark papers, and company blogs. The cost is interpretive: readers cannot reliably compare results across sources, identify what a report omits, or trace an aggregate claim to its underlying evidence. Recent efforts address isolated components but leave three gaps: they cover only narrow slices of the evaluation lifecycle and do not compose into a single interpretable record; they specif...

HuggingFace

Industry News

Microsoft's open source tools were hacked to steal passwords of AI developers

Microsoft's open source tools were compromised in a security breach that allowed attackers to steal passwords from AI developers.

RSS