Cainew - Curated AI news for developers

TL;DR

Tools & Products

Tools & Products

sapientinc/HRM-Text

HRM-Text is a 1B text generation model based on the HRM architecture, strengthened by task completion and latent space reasoning.

GitHub

raindrop-ai/workshop

Give your coding agent the power to write and run agent evals.

GitHub

Claude Code as a Daily Driver: Claude.md, Skills, Subagents, Plugins, and MCPs

Claude is being positioned as a viable daily driver for code development, with new features including Claude.md, Skills, Subagents, Plugins, and MCPs expanding its capabilities. These additions enable developers to use Claude for comprehensive software development workflows.

RSS

nekocode/filetree-skill

A Claude Code plugin that maintains `FILETREE.md`.

GitHub

stormzhang/ai-translate

AI translate tool for Claude Code, Codex, OpenCode & Cursor. One-line install, multi-language support with speech.

GitHub

Powabase: Build AI apps with Postgres, RAG, and agents

Powabase is a backend-as-a-service for AI-native applications, combining Postgres, RAG, agents, memory, workflows, and automation primitives in one platform. It helps agencies and in-house IT teams build new AI apps or add AI automation to existing products without stitching together fragmented infrastructure. Designed to work seamlessly with modern coding agents, Powabase helps teams ship faster while building more robust, token-efficient systems.

ProductHunt

Training our own AI models

Organizations are increasingly developing their own custom AI models rather than relying solely on third-party solutions. This trend reflects growing demand for specialized, proprietary AI capabilities tailored to specific business needs.

RSS

vuejs-ai/vue-tui

The Vue framework for terminal UIs. SFC & JSX, Yoga flexbox, HMR, and testing out of the box.

GitHub

Aimino-Tech/opendocswork-mcp

Rust-native MCP server for Office document processing (Excel, Word, PowerPoint). Sub-millisecond, local-first, open source.

GitHub

Oasis Browser for Mac: A privacy-first AI browser you can train anonymously

Oasis is a refuge from noisy, scattered browsing. Privacy comes first, in an elegant experience that AI makes feel lighter and more capable, not busier. Your data is your data. Period. As you train Oasis on what matters to you, it grows sharper, quicker, and truer to your everyday flow.

ProductHunt

Coworker AI: More AI for less spend with context-aware model routing

Same AI. 5x the tokens. Coworker provides deep company context and automatically routes to the right model for every task. More chat, cowork and code with the same spend.

ProductHunt

Cloudflare Flagship

Cloudflare has launched a flagship product offering, likely representing a major enhancement or new initiative within their platform. This product aims to solidify Cloudflare's position in the competitive cloud infrastructure market.

RSS

Research Papers

Where does next-token prediction leave us?

This piece explores the implications and limitations of next-token prediction as the foundational approach for large language models. The discussion examines what this architectural choice means for the future development and capabilities of AI systems.

RSS

DeepSWE: A contamination-free benchmark for long-horizon coding agents

DeepSWE introduces a new benchmark for evaluating long-horizon coding agents while ensuring the benchmark remains free from data contamination. This tool addresses the need for reliable, standardized evaluation metrics in autonomous code generation.

RSS

ITBench-AA: Frontier Models Score Below 50% on the First Benchmark for Agentic Enterprise IT Tasks — by Artificial Analysis and IBM

HuggingFace

Share More, Search Less: Collaborative Parallel Thinking for Efficient Test-Time Scaling

Test-Time Scaling (TTS) enhances the reasoning capabilities of large language models by allocating additional inference compute to explore the solution space. However, existing parallel TTS methods typically keep branches isolated during search: intermediate discoveries remain branch-private and cannot guide other branches in time. This information isolation causes substantial redundant exploration, as branches repeatedly rediscover information already found elsewhere and require more search ste...

HuggingFace

RT-Lynx: Putting the GEMM Sparsity In a Right Way for Diffusion Models

Diffusion Transformers (DiT) achieve strong performance in image generation but incur substantial inference costs. While prior work has reduced this cost via quantization and distillation, semi-structured sparsity, which can nearly halve FLOPs, remains underexplored. A key reason is that most existing approaches focus on weight sparsification, and pruning 50% of the weights can remove critical model capacity and degrade generation quality. Our study, however, shows that DiT activations are intri...

HuggingFace

MRT: Masked Region Transformer for Layered Image Generation and Editing at Scale

Layered image generation and editing is a fundamental capability that enables layer-wise reuse, editing, and composition of generated visual content, analogous to word-level editing in natural language. Despite its importance, this remains an underexplored area at scale. To address this gap, we present MRT, a 20B-parameter masked region diffusion model tailored for multi-layer transparent image generation and editing, trained on over 10M multilingual design samples spanning diverse aspect ratios...

HuggingFace

VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions

Large language models (LLMs) have evolved into interactive agents that collaborate with users in real-world tasks. Effective collaboration in such settings increasingly depends on understanding the user beyond what is explicitly stated, as user intent is often reflected in fragmented daily interactions and requires both personalized modeling and proactive interaction. However, existing agent benchmarks primarily evaluate reasoning and tool use, largely overlooking the challenges of inferring and...

HuggingFace

Gemini Embedding 2: A Native Multimodal Embedding Model from Gemini

We introduce Gemini Embedding 2, a native multimodal embedding model that allows embedding video, audio, image, and text modalities in a unified representation space. We leverage the multimodal capabilities of Gemini to produce embeddings for arbitrary combinations of interleaved inputs across all these modalities that generalize well across a wide variety of tasks. Applying large-scale contrastive learning in a multi-task multi-stage training setup, we achieve state-of-the-art performance on ke...

HuggingFace

QUACK: Questioning, Understanding, and Auditing Communicated Knowledge in Multimodal Social Deduction Agents

Social deduction games have become a popular testbed for probing reasoning, deception, coordination, and belief modeling in Large Language Model (LLM) agents. However, most environments are scored only by game outcomes such as win rates and largely remain to text-only interaction, making it difficult to tell whether an agent's language is actually grounded in what it perceived and did, or to identify the failure modes underlying its behavior. To address this gap, we introduce QUACK, an open-sour...

HuggingFace

MobileMoE: Scaling On-Device Mixture of Experts

Mixture-of-Experts (MoE) has become the de facto architecture for hundred-billion-parameter language models, yet its advantages at sub-billion scales for on-device deployment remain largely unexplored. To close this gap, we present MobileMoE, a family of on-device MoE language models with sub-billion active parameters (0.3-0.9B active and 1.3-5.3B total) that establish a new Pareto frontier for on-device LLMs. We first formulate an on-device MoE scaling law that jointly optimizes MoE architectur...

HuggingFace

Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models

Normalization layers in modern large language models (LLMs) consist of a deterministic normalization operation and a learnable scale vector. While the normalization operation has been extensively studied, the scale vector remains poorly understood despite its ubiquitous use. In this work, we present a systematic study of scale vectors in LLMs from the perspectives of expressivity, optimization, and architectural structure. First, we show empirically that although scale vectors constitute only a ...

HuggingFace

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

Vision-language models (VLMs) commonly formulate visual grounding and detection as a coordinate-token generation problem, serializing each 2D box into multiple 1D tokens that are learned and decoded largely independently. This token-by-token decoding mismatches the coupled structure of box geometry and creates a practical inference bottleneck due to strictly sequential generation. We introduce LocateAnything, a unified generative grounding and detection framework based on Parallel Box Decoding (...

HuggingFace

MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

Large language model (LLM) agents rely on reusable skills to solve complex tasks. However, existing skill creation approaches treat skills as isolated and static artifacts, limiting their reusability, reliability, and long-term improvement. We propose MUSE-Autoskill Agent (Memory-Utilizing Skill Evolution), a skill-centric agent framework that lets agents continuously improve their task-solving capability by creating, reusing, and refining skills under a unified lifecycle (creation, memory, mana...

HuggingFace

PRISM: Position-encoded Regressive Inverse Spectral Model for Multilayer Thin-Film Design

The inverse problem of multilayer thin-film optical coatings design represents a complex combinatorial-continuous optimization challenge. We present PRISM (Position-encoded Regressive Inverse Spectral Model), a unified decoder-only autoregressive transformer that streamlines this process by jointly predicting discrete material selection and continuous thickness regression within a single backbone. PRISM introduces two primary architectural innovations: (1) spectrum prefix conditioning, which uti...

HuggingFace

Recursive Flow Matching

Generative models have emerged as a powerful paradigm for solving physics systems and modeling complex spatiotemporal dynamics. However, achieving high physical accuracy without incurring high computational cost remains a fundamental challenge, as existing approaches face a critical speed-fidelity trade-off. In this work, we introduce Recursive Flow Matching (RecFM), a generative framework for forecasting complex spatiotemporal dynamics. RecFM enforces self-consistency to align trajectories acro...

HuggingFace

Tutorials

Building self-improving tax agents with Codex

See how OpenAI, Thrive, and Crete built a self-improving tax agent with Codex, automating filings, improving accuracy, and accelerating workflows.

OpenAI

Industry News

Bay Area mom out thousands after scammers use AI to mimic daughter's voice

A Bay Area mother lost thousands of dollars after scammers used AI voice synthesis technology to impersonate her daughter and request money. This incident highlights the growing security threats posed by deepfake voice technology.

RSS

Discussion

I think Anthropic and OpenAI have found product-market fit

Anthropic and OpenAI have achieved product-market fit with their AI offerings, indicating strong demand and alignment between their products and market needs. This suggests both companies are positioned as leaders in the commercial AI space.

RSS