HRM-Text is a 1B text generation model based on the HRM architecture, strengthened by task completion and latent space reasoning.
TL;DR
Tools & Products
Research Papers
Tools & Products
Give your coding agent the power to write and run agent evals.
Claude is being positioned as a viable daily driver for code development, with new features including Claude.md, Skills, Subagents, Plugins, and MCPs expanding its capabilities. These additions enable developers to use Claude for comprehensive software development workflows.
A Claude Code plugin that maintains `FILETREE.md`.
AI translate tool for Claude Code, Codex, OpenCode & Cursor. One-line install, multi-language support with speech.
Powabase is a backend-as-a-service for AI-native applications, combining Postgres, RAG, agents, memory, workflows, and automation primitives in one platform. It helps agencies and in-house IT teams build new AI apps or add AI automation to existing products without stitching together fragmented infrastructure. Designed to work seamlessly with modern coding agents, Powabase helps teams ship faster while building more robust, token-efficient systems.
Organizations are increasingly developing their own custom AI models rather than relying solely on third-party solutions. This trend reflects growing demand for specialized, proprietary AI capabilities tailored to specific business needs.
The Vue framework for terminal UIs. SFC & JSX, Yoga flexbox, HMR, and testing out of the box.
Rust-native MCP server for Office document processing (Excel, Word, PowerPoint). Sub-millisecond, local-first, open source.
Oasis is a refuge from noisy, scattered browsing. Privacy comes first, in an elegant experience that AI makes feel lighter and more capable, not busier. Your data is your data. Period. As you train Oasis on what matters to you, it grows sharper, quicker, and truer to your everyday flow.
Same AI. 5x the tokens. Coworker provides deep company context and automatically routes to the right model for every task. More chat, cowork and code with the same spend.
Cloudflare has launched a flagship product offering, likely representing a major enhancement or new initiative within their platform. This product aims to solidify Cloudflare's position in the competitive cloud infrastructure market.
Research Papers
This piece explores the implications and limitations of next-token prediction as the foundational approach for large language models. The discussion examines what this architectural choice means for the future development and capabilities of AI systems.
DeepSWE introduces a new benchmark for evaluating long-horizon coding agents while ensuring the benchmark remains free from data contamination. This tool addresses the need for reliable, standardized evaluation metrics in autonomous code generation.
Test-Time Scaling (TTS) enhances the reasoning capabilities of large language models by allocating additional inference compute to explore the solution space. However, existing parallel TTS methods typically keep branches isolated during search: intermediate discoveries remain branch-private and cannot guide other branches in time. This information isolation causes substantial redundant exploration, as branches repeatedly rediscover information already found elsewhere and require more search ste...
Diffusion Transformers (DiT) achieve strong performance in image generation but incur substantial inference costs. While prior work has reduced this cost via quantization and distillation, semi-structured sparsity, which can nearly halve FLOPs, remains underexplored. A key reason is that most existing approaches focus on weight sparsification, and pruning 50% of the weights can remove critical model capacity and degrade generation quality. Our study, however, shows that DiT activations are intri...
Layered image generation and editing is a fundamental capability that enables layer-wise reuse, editing, and composition of generated visual content, analogous to word-level editing in natural language. Despite its importance, this remains an underexplored area at scale. To address this gap, we present MRT, a 20B-parameter masked region diffusion model tailored for multi-layer transparent image generation and editing, trained on over 10M multilingual design samples spanning diverse aspect ratios...
Large language models (LLMs) have evolved into interactive agents that collaborate with users in real-world tasks. Effective collaboration in such settings increasingly depends on understanding the user beyond what is explicitly stated, as user intent is often reflected in fragmented daily interactions and requires both personalized modeling and proactive interaction. However, existing agent benchmarks primarily evaluate reasoning and tool use, largely overlooking the challenges of inferring and...
We introduce Gemini Embedding 2, a native multimodal embedding model that allows embedding video, audio, image, and text modalities in a unified representation space. We leverage the multimodal capabilities of Gemini to produce embeddings for arbitrary combinations of interleaved inputs across all these modalities that generalize well across a wide variety of tasks. Applying large-scale contrastive learning in a multi-task multi-stage training setup, we achieve state-of-the-art performance on ke...
Social deduction games have become a popular testbed for probing reasoning, deception, coordination, and belief modeling in Large Language Model (LLM) agents. However, most environments are scored only by game outcomes such as win rates and largely remain to text-only interaction, making it difficult to tell whether an agent's language is actually grounded in what it perceived and did, or to identify the failure modes underlying its behavior. To address this gap, we introduce QUACK, an open-sour...
Mixture-of-Experts (MoE) has become the de facto architecture for hundred-billion-parameter language models, yet its advantages at sub-billion scales for on-device deployment remain largely unexplored. To close this gap, we present MobileMoE, a family of on-device MoE language models with sub-billion active parameters (0.3-0.9B active and 1.3-5.3B total) that establish a new Pareto frontier for on-device LLMs. We first formulate an on-device MoE scaling law that jointly optimizes MoE architectur...
Normalization layers in modern large language models (LLMs) consist of a deterministic normalization operation and a learnable scale vector. While the normalization operation has been extensively studied, the scale vector remains poorly understood despite its ubiquitous use. In this work, we present a systematic study of scale vectors in LLMs from the perspectives of expressivity, optimization, and architectural structure. First, we show empirically that although scale vectors constitute only a ...
Vision-language models (VLMs) commonly formulate visual grounding and detection as a coordinate-token generation problem, serializing each 2D box into multiple 1D tokens that are learned and decoded largely independently. This token-by-token decoding mismatches the coupled structure of box geometry and creates a practical inference bottleneck due to strictly sequential generation. We introduce LocateAnything, a unified generative grounding and detection framework based on Parallel Box Decoding (...
Large language model (LLM) agents rely on reusable skills to solve complex tasks. However, existing skill creation approaches treat skills as isolated and static artifacts, limiting their reusability, reliability, and long-term improvement. We propose MUSE-Autoskill Agent (Memory-Utilizing Skill Evolution), a skill-centric agent framework that lets agents continuously improve their task-solving capability by creating, reusing, and refining skills under a unified lifecycle (creation, memory, mana...
The inverse problem of multilayer thin-film optical coatings design represents a complex combinatorial-continuous optimization challenge. We present PRISM (Position-encoded Regressive Inverse Spectral Model), a unified decoder-only autoregressive transformer that streamlines this process by jointly predicting discrete material selection and continuous thickness regression within a single backbone. PRISM introduces two primary architectural innovations: (1) spectrum prefix conditioning, which uti...
Generative models have emerged as a powerful paradigm for solving physics systems and modeling complex spatiotemporal dynamics. However, achieving high physical accuracy without incurring high computational cost remains a fundamental challenge, as existing approaches face a critical speed-fidelity trade-off. In this work, we introduce Recursive Flow Matching (RecFM), a generative framework for forecasting complex spatiotemporal dynamics. RecFM enforces self-consistency to align trajectories acro...
Tutorials
See how OpenAI, Thrive, and Crete built a self-improving tax agent with Codex, automating filings, improving accuracy, and accelerating workflows.
Industry News
A Bay Area mother lost thousands of dollars after scammers used AI voice synthesis technology to impersonate her daughter and request money. This incident highlights the growing security threats posed by deepfake voice technology.
Discussion
Anthropic and OpenAI have achieved product-market fit with their AI offerings, indicating strong demand and alignment between their products and market needs. This suggests both companies are positioned as leaders in the commercial AI space.