Cainew - Curated AI news for developers

TL;DR

Model Releases

Tools & Products

Research Papers

Industry News

Model Releases

yuxinlu1/gemma-4-12B-agentic-fable5-composer2.5-v2-3.5x-tau2-GGUF

This is a quantized GGUF format version of the Gemma 4 12B model configured for agentic capabilities, combining multiple specialized components for enhanced performance. The model is optimized for efficient deployment while maintaining strong reasoning and task execution abilities.

HuggingFace

datalab-to/lift

Datalab-to/lift appears to be a dataset or model resource, though specific details about its purpose and functionality are limited without additional context. It may relate to data processing, model lifting, or transfer learning applications.

HuggingFace

Tools & Products

JimLiu/baoyu-design

Run Claude Design locally as an Agent Skill — Cursor, Claude Code & more. Produce polished UI mockups, prototypes, decks & wireframes as self-contained HTML, without claude.ai/design. Best with Opus 4.8.

GitHub

samarailly51-pixel/opencode-harness

Clean-room, model-agnostic harness for Claude Code-class coding agents

GitHub

microsoft/SwiftStreamingMarkdown

A performant markdown library for iOS that supports streaming

GitHub

kennss/SiliconScope

Sudoless Apple Silicon system monitor (native SwiftUI GUI) with ANE / Media Engine / memory-bandwidth tracking

GitHub

Goekdeniz-Guelmez/MLX-LoRA-Studio

A native Mac App for LLM fine-tuning on Apple Silicon — fully on-device, fully open source.

GitHub

Claude Code Artifacts: Preview and share your coding work live as it happens

Preview your in-progress work in Claude Code as a live, interactive artifact—built from your full session context and shareable with your team.

ProductHunt

Firecrawl Research Index: An index for agents pushing the frontier of AI/ML research

AI/ML research moves fast, and the work that matters is split between new papers and the code that implements them. Most search providers omit or misrank key papers, leaving you to review sources by hand without ever being sure you've caught everything. So we built an index for it. Firecrawl's index includes all 3M+ arXiv papers, as well as GitHub artifacts from top research repos, refreshed daily so agents always stay current.

ProductHunt

frontpage.sh: A perpetual auction for eight ad squares

A perpetual auction for eight ad squares. Pay a multiple of the last price to take one; when someone outbids you, you leave with up to 1.5× what you paid. 80% of every flip funds a pool that buys the page more attention. Agents do the buying — two HTTP calls, USDC on Tempo, no accounts.

ProductHunt

Unreal Engine 5.8: Build unreal games with AI agents

Unreal Engine 5.8 is the final major milestone of the UE5 lifecycle. It introduces experimental 3D Mesh Terrain to replace traditional heightfields, production-ready MegaLights for current-gen consoles, and a native MCP plugin for AI agent automation.

ProductHunt

Ask Ad Manager by Google Ads: Gemini-powered AI agent for insights & faster ad decisions

AI agent, built with Gemini, helps publishers get deeper insights, understand their performance and make better decisions faster.

ProductHunt

Zero-Touch OAuth for MCP

Zero-Touch OAuth for MCP introduces a simplified authentication mechanism for Model Context Protocol, eliminating manual configuration steps and enabling seamless integration for AI applications.

RSS

Research Papers

ENPIRE: Agentic Robot Policy Self-Improvement in the Real World

Achieving dexterous robotic manipulation in the real world heavily relies on human supervision and algorithm engineering, which becomes a central bottleneck in the pursuit of general physical intelligence. Although emerging coding agents can generate code to automate algorithm search, their successes remain largely confined in digital environments. We conjecture that the missing abstraction to automate robotics research is a repeatable feedback loop for real-world policy improvement: reset the s...

HuggingFace

FlowBender: Feedback-Aware Training for Self-Correcting Conditional Flows

Conditional diffusion and flow models routinely fail to satisfy the very constraints that define their task. For instance, a depth-conditioned model often produces images whose re-extracted depth disagrees with the input, even though the forward operator--the depth predictor defining the constraint--is available during both training and inference. Existing approaches generally fall into two categories: supervised models that treat the conditioning signal as a static cue and ignore alignment info...

HuggingFace

S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence

Real-world spatial intelligence requires reasoning over a continuous and evolving 3D world, yet existing VLMs and tool-augmented agents largely remain tied to static, stateless inference from isolated visual observations. We introduce \textsc{S-Agent}, a spatial tool-use agentic paradigm for understanding and reasoning over continuous multi-view images and videos. By formulating spatial reasoning as spatio-temporal evidence accumulation rather than isolated frame-level prediction, S-Agent reshap...

HuggingFace

DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis

Advances in radiance fields have enabled photorealistic novel view synthesis. In several domains, large-scale real-world datasets have been developed to support comprehensive benchmarking and to facilitate progress beyond scene-specific reconstruction. However, for distractor-free radiance fields, a large-scale dataset with clean and cluttered images per scene remains lacking, limiting the development. To address this gap, we introduce DF3DV-1K, a large-scale real-world dataset comprising 1,048 ...

HuggingFace

The FID Lottery: Quantifying Hidden Randomness in Generative-Model Evaluation

The Frechet Inception Distance (FID) is the de facto arbiter of image generation, yet most papers report just a single number from a single trained model using a single sampling seed. How reproducible is that number if we retrain the model, or merely resample from it? In this paper, we treat FID as a random variable on a two-axis panel of training and generation seeds, and measure its variance directly on several hundred SiT networks trained on class-conditional ImageNet 256x256. We report surpr...

HuggingFace

LedgerAgent: Structured State for Policy-Adherent Tool-Calling Agents

Policy-adherent tool-calling agents in customer-service domains must maintain task states across turns while calling tools and obeying domain policies. Task states consist of relevant facts, identifiers, constraints, and conditions observed through user interaction and tool calls. In standard agents, task states are not represented separately. Observations, tool returns, and policy instructions are placed in the prompt, leaving agents to reconstruct the relevant states from the prompt each time ...

HuggingFace

Holo-World: Unified Camera, Object and Weather Control for Video World Model

Video world models are moving toward preserving an observed world under controllable camera and object motion while allowing its environmental state to change. Yet these controls remain isolated, and weather generation typically relies on a source video or reconstructed scene that already specifies future structure. We study a first-frame-anchored source-to-state setting, where the model starts from a single image and follows explicit camera and object controls and an optional weather instructio...

HuggingFace

Current World Models Lack a Persistent State Core

World models are increasingly regarded as a decisive step toward artificial general intelligence, yet modeling the physical world demands more than rendering convincing frames on demand: it requires an internal world state that keeps evolving over time, decoupled from observation, so that objects endure and events run to their conclusions whether or not a camera is watching, much as the moon holds to its orbit when no one is looking. This requirement is a blind spot of existing benchmarks, which...

HuggingFace

JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising

Creating 3D visual illusions, a single 3D mesh that reveals entirely different semantics from various viewing angles, is a fascinating but tough challenge. Existing optimization-based methods are slow and can produce oversaturated colors. In contrast, naive stitching approaches fail to produce geometrically coherent objects. This results in visible unnatural seams and semantic leaks. In this paper, we present a fast and training-free framework for generating text-driven 3D visual illusions. Our ...

HuggingFace

HumanScale: Egocentric Human Video Can Outperform Real-Robot Data for Embodied Pretraining

Embodied foundation models are expected to benefit from data scaling like large language models, but face a much tighter data bottleneck. Teleoperated real-robot trajectories remain the dominant pretraining source due to their precise action supervision and embodiment alignment, yet their scalability is limited by high collection cost, acquisition difficulty, and low behavioral and environmental diversity. These limitations have sparked interest in egocentric human video as a scalable, substanti...

HuggingFace

JAMER: Project-Level Code Framework Dataset and Benchmark on Professional Game Engines

Current AI-driven game development has made substantial progress in asset generation, gameplay design, and web-based game coding, yet project-level code engineering on professional game engines remains largely unexplored due to the absence of large-scale datasets and deterministic evaluation methods. We present JamSet and JamBench, the first project-level game code framework dataset and benchmark built on a professional game engine. Our key insight is that Game Jam competitions, community events...

HuggingFace

FreeStyle: Free Control of Style-Content Dual-Reference Generation from Community LoRA Mining

Style-content dual-reference generation aims to synthesize an image that preserves the structure and semantics of a content reference while adopting the style of a separate style reference.Despite recent progress, this setting remains challenging because models must balance content fidelity, style alignment, and instruction following avoiding semantic leakage from the style reference.A key bottleneck is the lack of large-scale triplet data with clean content-style separation and broad long-tail ...

HuggingFace

Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents

Agent benchmarks are growing fast, but no single benchmark touches more than four or five of the dimensions that deployment exposes. This paper aggregates the largest coordinated deep-dive of one MCP-based industrial-agent benchmark to date: fourteen parallel implementation studies covering new asset classes (including a multi-modal visual extension), alternative orchestrations, retrieval strategies, reasoning modes, infrastructure optimizations, and evaluation-methodology probes. Consolidating ...

HuggingFace

Rethinking Shrinkage Bias in LLM FP4 Pretraining: Geometric Origin, Systemic Impact, and UFP4 Recipe

FP4 training promises substantial reductions in memory and computation cost for LLM pretraining, yet current FP4 hardware paths and recipes, including NVIDIA Blackwell/Rubin-class systems and AMD MI350-series GPUs, remain centered on E2M1 data elements. In this study, we identify a fundamental limitation of that choice: non-uniform formats such as E2M1 inherently suffer from Shrinkage Bias, a systematic negative rounding error caused by the geometric asymmetry of their representable bins. We sho...

HuggingFace

Think Again or Think Longer? Selective Verification for Budget-Aware Reasoning

Test-time reasoning is increasingly used as a serving-time control knob, but extra reasoning is not uniformly valuable: it can repair failed attempts, waste compute on already-correct answers, or introduce harmful answer changes. We study this as a deployment allocation problem rather than a new-verifier problem. We introduce \sevra, Selective Verification for Reasoning Allocation, a serving-layer controller that decides whether to preserve a frozen solver's initial answer or invoke active verif...

HuggingFace

Industry News

Hyundai buys Boston Dynamics

Hyundai Motor Company acquired Boston Dynamics, a leading robotics company known for its advanced humanoid and quadruped robots, strengthening Hyundai's position in robotics and automation technology.

RSS

Generative AI Is Having Its Herbalife Moment

The article examines how generative AI's rapid hype cycle and inflated expectations mirror the characteristics of multi-level marketing schemes, warning of potential market disillusionment.

RSS

Amazon investigating engineers who criticized AI data center expansion

Amazon has launched an investigation into engineers who publicly criticized the company's AI data center expansion plans, raising concerns about employee free speech and corporate transparency.

RSS