Cainew - Curated AI news for developers

TL;DR

Tools & Products

Research Papers

Tutorials

Indexing a year of video locally on a 2021 MacBook with Gemma4-31B (50GB swap)

Industry News

Tools & Products

✨ The agentic HTML editor — your local AI agent writes the HTML, you ship it. 🚀 75 Skills × 9 Surfaces (magazine · deck · poster · XHS / tweet · prototype · data report · Hyperframes) 🛡️ Sandboxed preview · 📤 1-click to WeChat / X / Zhihu / HTML / PNG 🔑 Zero API key — Claude Code / Cursor / Codex / Gemini / Copilot / OpenCode / Qwen / Aider.

GitHub

raiyanyahya/how-to-train-your-gpt

Build a modern LLM from scratch. Every line commented. Explained like we are five.

GitHub

HermannBjorgvin/Clawdmeter

ESP32 desk dashboard that shows Claude Code usage

GitHub

openlake-project/openlake

Hyper efficient storage for GPU workloads. Feed your GPUs at blazing fast speeds.

GitHub

seochecks-ai/slopless

Deterministic textlint rules and CLI for catching prose slop in Markdown

GitHub

varandrew/moor

Moor is a local MCP control plane for Mac. It gives every coding agent one safe, observable, configurable gateway to your MCP servers.

GitHub

tanviet12/vbsec

A Claude Code skill that performs in-depth security scans and detects 20+ of the most common security vulnerabilities in your source code.

GitHub

Tomotsugu-dev/Hindsight

Local-first desktop activity tracker — see where your hours go, with on-device AI daily summaries and optional multi-device sync

GitHub

Google Antigravity 2.0: Orchestrate multi-agent workflows from a desktop app

Google Antigravity 2.0 is a standalone desktop app for orchestrating multiple AI agents in parallel, with scheduled background tasks, subagent workflows, and native integrations with AI Studio, Firebase, and Android. Built for developers building production apps.

ProductHunt

WeWeb 3.0: Vibe-code apps with the safety net of a no-code editor

WeWeb is the only AI app builder that gives full editing control to non-coders. Prompt AI to generate your app, then refine every screen, workflow, and database in a powerful no-code editor where you always understand what’s happening under the hood. No more black box.

ProductHunt

Mintlify Workflows: Self-updating knowledge bases

Keep your docs moving as fast as your product. Mintlify Workflows lets teams turn on pre-built automations that update knowledge bases, generate changelogs, maintain translations, and handle repetitive documentation tasks whenever triggered. Instead of chasing every product change manually, teams can set up Workflows once and let Mintlify keep docs accurate, current, and ready for users.

ProductHunt

Mixpanel Headless: Programmatic access to product analytics for agents and devs

Mixpanel Headless is a Python SDK that makes the entire product surface programmable, so agents and devs can dig into data without leaving their IDE.

ProductHunt

Visual Usability Checker: Validate your design decisions instantly with AI insights

Get instant AI recommendations to improve your design. Detect cognitive load, see where users focus, catch issues early, and compare variations - so you can confidently make and defend design decisions with data-backed insights.

ProductHunt

CatchAll by NewsCatcher: Build any dataset from the web. Filtered to your criteria.

CatchAll is a web search API that builds structured datasets from the open web. Submit a query, and it scans thousands of web pages, validates every result, and returns clean, deduplicated records — not a ranked list of links, but a dataset of real-world events, ready for workflows and pipelines.

ProductHunt

Show HN: CPU-only transcription for YouTube, TikTok, X, Instagram videos

A new CPU-only transcription tool enables users to transcribe videos from major platforms including YouTube, TikTok, X, and Instagram without requiring GPU resources. This approach makes transcription more accessible and cost-effective for broader audiences.

GitHub

Research Papers

An OpenAI model has disproved a central conjecture in discrete geometry

An OpenAI model has successfully disproved a central conjecture in discrete geometry, demonstrating AI's capability to contribute to advanced mathematical research. This represents a significant achievement in using machine learning for theoretical mathematics.

OpenAI

You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories

Reinforcement learning with verifiable rewards (RLVR) has become a dominant paradigm for improving reasoning in large language models (LLMs), yet the underlying geometry of the resulting parameter trajectories remains underexplored. In this work, we demonstrate that RLVR weight trajectories are extremely low-rank and highly predictable. Specifically, we find that the majority of downstream performance gains are captured by a rank-1 approximation of the parameter deltas, where the magnitude of th...

HuggingFace

Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning

Currently, enhancing Unified Multimodal Models (UMMs) with image understanding, generation, and editing capabilities mainly relies on mixed multi-task training. Due to inherent task conflicts, such strategy requires complex multi-stage pipelines, massive data mixing, and balancing tricks, merely resulting in a performance trade-off rather than true mutual reinforcement. To break this paradigm, we propose Uni-Edit, an intelligent image editing task that serves as the first general task for UMM tu...

HuggingFace

IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools

Multimodal large language models (MLLMs) have shown remarkable capability in bridging visual perception and textual reasoning, enabling zero-shot understanding across diverse industrial scenarios. However, their performance in open-vocabulary industrial anomaly detection (IAD) is often limited by domain-misaligned reasoning and hallucinated structural inferences. To address these challenges, we propose IndusAgent, a tool-augmented agentic framework for open-vocabulary IAD. Specifically, we first...

HuggingFace

iTryOn: Mastering Interactive Video Virtual Try-On with Spatial-Semantic Guidance

Video Virtual Try-On (VVT) aims to seamlessly replace a garment on a person in a video with a new one. While existing methods have made significant strides in maintaining temporal consistency, they are predominantly confined to non-interactive scenarios where models merely showcase garments. This limitation overlooks a crucial aspect of real-world apparel presentation: active human-garment interaction. To bridge this gap, we introduce and formalize a new challenging task: Interactive Video Virtu...

HuggingFace

SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents

As long-horizon coding agents produce more code than any developer can review, oversight collapses onto a single surface: the automated test suite. Reward hacking naturally arises in this setup, as the agent optimizes for passing tests while deviating from the users true goal. We study this reward hacking phenomenon by decompose software engineering tasks into three parts: (i) a natural language description of the specification (ii) visible validation tests that exercise specified features in is...

HuggingFace

UniT: Unified Geometry Learning with Group Autoregressive Transformer

Recent feed-forward models have significantly advanced geometry perception for inferring dense 3D structure from sensor observations. However, its essential capabilities remain fragmented across multiple incompatible paradigms, including online perception, offline reconstruction, multi-modal integration, long-horizon scalability, and metric-scale estimation. We present UniT, a unified model built upon a novel Group Autoregressive Transformer, which reformulates these seemingly disparate capabili...

HuggingFace

Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment

Direct Preference Optimization (DPO) has emerged as a popular alternative to Reinforcement Learning from Human Feedback (RLHF), offering theoretical equivalence with simpler implementation. We prove this equivalence is conditional rather than universal, depending on an implicit assumption frequently violated in practice: the RLHF-optimal policy must prefer human-preferred responses. When this assumption fails, DPO optimizes relative advantage over the reference policy rather than absolute alignm...

HuggingFace

DrawMotion: Generating 3D Human Motions by Freehand Drawing

Text-to-motion generation, which translates textual descriptions into human motions, faces the challenge that users often struggle to precisely convey their intended motions through text alone. To address this issue, this paper introduces DrawMotion, an efficient diffusion-based framework designed for multi-condition scenarios. DrawMotion generates motions based on both a conventional text condition and a novel hand-drawing condition, which provide semantic and spatial control over the generated...

HuggingFace

OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation

Recent layout-to-image models have achieved remarkable progress in spatial controllability. However, they still struggle with inter-object occlusion. When bounding boxes overlap, most existing methods lack explicit occlusion information, which makes the generation in intersection regions inherently ambiguous and hinders the determination of complex occlusion relationships. As a result, they often produce entangled textures or physically inconsistent layering in the overlapped areas. To address t...

HuggingFace

On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

With the advancement of AI capabilities, AI reviewers are beginning to be deployed in scientific peer review, yet their capability and credibility remain in question: many scientists simply view them as probabilistic systems without the expertise to evaluate research, while other researchers are more optimistic about their readiness without concrete evidence. Understanding what AI reviewers do well, where they fall short, and what challenges remain is essential. However, existing evaluations of ...

HuggingFace

HRM-Text: Efficient Pretraining Beyond Scaling

The current pretraining paradigm for large language models relies on massive compute and internet-scale raw text, creating a significant barrier to foundational research. In contrast, biological systems demonstrate highly sample-efficient learning through multi-timescale processing, such as the functional organization of the frontoparietal loop. Taking this as inspiration, we introduce HRM-Text, which replaces standard Transformers with a Hierarchical Recurrent Model (HRM) that decouples computa...

HuggingFace

Mem-π: Adaptive Memory through Learning When and What to Generate

We present Mem-π, a framework for adaptive memory in large language model (LLM) agents, where useful guidance is generated on demand rather than retrieved from external memory stores. Existing memory-augmented agents typically rely on similarity-based retrieval from episodic memory banks or skill libraries, returning static entries that often misalign with the current context. In contrast, Mem-π uses a dedicated language or vision-language model with its own parameters, separate from the downs...

HuggingFace

OCTOPUS: Optimized KV Cache for Transformers via Octahedral Parametrization Under optimal Squared error quantization

The key-value (KV) cache dominates memory bandwidth and footprint in long-context autoregressive inference. Recent rotation-preconditioned codecs (TurboQuant, PolarQuant) show that a structured random rotation followed by a per-coordinate scalar quantizer matched to an analytically tractable marginal is a near-optimal recipe for KV compression. OCTOPUS advances this paradigm through joint quantization of rotated coordinate triplets. Each triplet's direction is mapped to a square via an octahedra...

HuggingFace

PlanningBench: Generating Scalable and Verifiable Planning Data for Evaluating and Training Large Language Models

Planning is a fundamental capability for large language models (LLMs) because such complex tasks require models to coordinate goals, constraints, resources, and long-term consequences into executable and verifiable solutions. Existing planning benchmarks, however, usually treat planning data as fixed collections of instances rather than controllable generation targets. This limits scenario coverage, ties difficulty to surface-level proxies rather than structural sources, and offers limited suppo...

HuggingFace

Tutorials

Indexing a year of video locally on a 2021 MacBook with Gemma4-31B (50GB swap)

A demonstration of efficiently indexing a full year of video content locally on a 2021 MacBook using the Gemma4-31B model with 50GB of swap space. This showcases the feasibility of running large AI models on consumer-grade hardware for video processing tasks.

RSS

Industry News

Anthropic is expanding to Colossus2. Will use GB200

Anthropic is expanding its infrastructure to Colossus2 and will utilize NVIDIA's GB200 GPUs for enhanced computational capacity. This expansion supports the company's growing AI model training and deployment needs.

Twitter

Intuit to lay off over 3k employees to refocus on AI

Intuit is laying off over 3,000 employees as part of a strategic refocus on artificial intelligence and automation capabilities. The restructuring aims to shift company resources toward AI-driven products and services.

RSS

Waymo pauses Atlanta service as its robotaxis keep driving into floods

Waymo has paused its robotaxi service in Atlanta after multiple incidents where its autonomous vehicles drove into flooded areas. The pause highlights safety challenges in handling unexpected weather conditions and dynamic road hazards.

RSS

Cloudflare CEO on how he chooses which employees to replace with AI

Cloudflare's CEO discusses his decision-making process for identifying and replacing employees with AI systems, offering insights into corporate automation strategies. The discussion reflects broader industry trends of using AI to augment or substitute human workforce roles.

RSS