β¨ The agentic HTML editor β your local AI agent writes the HTML, you ship it. π 75 Skills Γ 9 Surfaces (magazine Β· deck Β· poster Β· XHS / tweet Β· prototype Β· data report Β· Hyperframes) π‘οΈ Sandboxed preview Β· π€ 1-click to WeChat / X / Zhihu / HTML / PNG π Zero API key β Claude Code / Cursor / Codex / Gemini / Copilot / OpenCode / Qwen / Aider.
TL;DR
Research Papers
Tools & Products
Build a modern LLM from scratch. Every line commented. Explained like we are five.
ESP32 desk dashboard that shows Claude Code usage
Hyper efficient storage for GPU workloads. Feed your GPUs at blazing fast speeds.
Deterministic textlint rules and CLI for catching prose slop in Markdown
Moor is a local MCP control plane for Mac. It gives every coding agent one safe, observable, configurable gateway to your MCP servers.
A Claude Code skill that performs in-depth security scans and detects 20+ of the most common security vulnerabilities in your source code.
Local-first desktop activity tracker β see where your hours go, with on-device AI daily summaries and optional multi-device sync
Google Antigravity 2.0 is a standalone desktop app for orchestrating multiple AI agents in parallel, with scheduled background tasks, subagent workflows, and native integrations with AI Studio, Firebase, and Android. Built for developers building production apps.
WeWeb is the only AI app builder that gives full editing control to non-coders. Prompt AI to generate your app, then refine every screen, workflow, and database in a powerful no-code editor where you always understand whatβs happening under the hood. No more black box.
Keep your docs moving as fast as your product. Mintlify Workflows lets teams turn on pre-built automations that update knowledge bases, generate changelogs, maintain translations, and handle repetitive documentation tasks whenever triggered. Instead of chasing every product change manually, teams can set up Workflows once and let Mintlify keep docs accurate, current, and ready for users.
Mixpanel Headless is a Python SDK that makes the entire product surface programmable, so agents and devs can dig into data without leaving their IDE.
Get instant AI recommendations to improve your design. Detect cognitive load, see where users focus, catch issues early, and compare variations - so you can confidently make and defend design decisions with data-backed insights.
CatchAll is a web search API that builds structured datasets from the open web. Submit a query, and it scans thousands of web pages, validates every result, and returns clean, deduplicated records β not a ranked list of links, but a dataset of real-world events, ready for workflows and pipelines.
A new CPU-only transcription tool enables users to transcribe videos from major platforms including YouTube, TikTok, X, and Instagram without requiring GPU resources. This approach makes transcription more accessible and cost-effective for broader audiences.
Research Papers
An OpenAI model has successfully disproved a central conjecture in discrete geometry, demonstrating AI's capability to contribute to advanced mathematical research. This represents a significant achievement in using machine learning for theoretical mathematics.
Reinforcement learning with verifiable rewards (RLVR) has become a dominant paradigm for improving reasoning in large language models (LLMs), yet the underlying geometry of the resulting parameter trajectories remains underexplored. In this work, we demonstrate that RLVR weight trajectories are extremely low-rank and highly predictable. Specifically, we find that the majority of downstream performance gains are captured by a rank-1 approximation of the parameter deltas, where the magnitude of th...
Currently, enhancing Unified Multimodal Models (UMMs) with image understanding, generation, and editing capabilities mainly relies on mixed multi-task training. Due to inherent task conflicts, such strategy requires complex multi-stage pipelines, massive data mixing, and balancing tricks, merely resulting in a performance trade-off rather than true mutual reinforcement. To break this paradigm, we propose Uni-Edit, an intelligent image editing task that serves as the first general task for UMM tu...
Multimodal large language models (MLLMs) have shown remarkable capability in bridging visual perception and textual reasoning, enabling zero-shot understanding across diverse industrial scenarios. However, their performance in open-vocabulary industrial anomaly detection (IAD) is often limited by domain-misaligned reasoning and hallucinated structural inferences. To address these challenges, we propose IndusAgent, a tool-augmented agentic framework for open-vocabulary IAD. Specifically, we first...
Video Virtual Try-On (VVT) aims to seamlessly replace a garment on a person in a video with a new one. While existing methods have made significant strides in maintaining temporal consistency, they are predominantly confined to non-interactive scenarios where models merely showcase garments. This limitation overlooks a crucial aspect of real-world apparel presentation: active human-garment interaction. To bridge this gap, we introduce and formalize a new challenging task: Interactive Video Virtu...
As long-horizon coding agents produce more code than any developer can review, oversight collapses onto a single surface: the automated test suite. Reward hacking naturally arises in this setup, as the agent optimizes for passing tests while deviating from the users true goal. We study this reward hacking phenomenon by decompose software engineering tasks into three parts: (i) a natural language description of the specification (ii) visible validation tests that exercise specified features in is...
Recent feed-forward models have significantly advanced geometry perception for inferring dense 3D structure from sensor observations. However, its essential capabilities remain fragmented across multiple incompatible paradigms, including online perception, offline reconstruction, multi-modal integration, long-horizon scalability, and metric-scale estimation. We present UniT, a unified model built upon a novel Group Autoregressive Transformer, which reformulates these seemingly disparate capabili...
Direct Preference Optimization (DPO) has emerged as a popular alternative to Reinforcement Learning from Human Feedback (RLHF), offering theoretical equivalence with simpler implementation. We prove this equivalence is conditional rather than universal, depending on an implicit assumption frequently violated in practice: the RLHF-optimal policy must prefer human-preferred responses. When this assumption fails, DPO optimizes relative advantage over the reference policy rather than absolute alignm...
Text-to-motion generation, which translates textual descriptions into human motions, faces the challenge that users often struggle to precisely convey their intended motions through text alone. To address this issue, this paper introduces DrawMotion, an efficient diffusion-based framework designed for multi-condition scenarios. DrawMotion generates motions based on both a conventional text condition and a novel hand-drawing condition, which provide semantic and spatial control over the generated...
Recent layout-to-image models have achieved remarkable progress in spatial controllability. However, they still struggle with inter-object occlusion. When bounding boxes overlap, most existing methods lack explicit occlusion information, which makes the generation in intersection regions inherently ambiguous and hinders the determination of complex occlusion relationships. As a result, they often produce entangled textures or physically inconsistent layering in the overlapped areas. To address t...
With the advancement of AI capabilities, AI reviewers are beginning to be deployed in scientific peer review, yet their capability and credibility remain in question: many scientists simply view them as probabilistic systems without the expertise to evaluate research, while other researchers are more optimistic about their readiness without concrete evidence. Understanding what AI reviewers do well, where they fall short, and what challenges remain is essential. However, existing evaluations of ...
The current pretraining paradigm for large language models relies on massive compute and internet-scale raw text, creating a significant barrier to foundational research. In contrast, biological systems demonstrate highly sample-efficient learning through multi-timescale processing, such as the functional organization of the frontoparietal loop. Taking this as inspiration, we introduce HRM-Text, which replaces standard Transformers with a Hierarchical Recurrent Model (HRM) that decouples computa...
We present Mem-Ο, a framework for adaptive memory in large language model (LLM) agents, where useful guidance is generated on demand rather than retrieved from external memory stores. Existing memory-augmented agents typically rely on similarity-based retrieval from episodic memory banks or skill libraries, returning static entries that often misalign with the current context. In contrast, Mem-Ο uses a dedicated language or vision-language model with its own parameters, separate from the downs...
The key-value (KV) cache dominates memory bandwidth and footprint in long-context autoregressive inference. Recent rotation-preconditioned codecs (TurboQuant, PolarQuant) show that a structured random rotation followed by a per-coordinate scalar quantizer matched to an analytically tractable marginal is a near-optimal recipe for KV compression. OCTOPUS advances this paradigm through joint quantization of rotated coordinate triplets. Each triplet's direction is mapped to a square via an octahedra...
Planning is a fundamental capability for large language models (LLMs) because such complex tasks require models to coordinate goals, constraints, resources, and long-term consequences into executable and verifiable solutions. Existing planning benchmarks, however, usually treat planning data as fixed collections of instances rather than controllable generation targets. This limits scenario coverage, ties difficulty to surface-level proxies rather than structural sources, and offers limited suppo...
Tutorials
A demonstration of efficiently indexing a full year of video content locally on a 2021 MacBook using the Gemma4-31B model with 50GB of swap space. This showcases the feasibility of running large AI models on consumer-grade hardware for video processing tasks.
Industry News
Anthropic is expanding its infrastructure to Colossus2 and will utilize NVIDIA's GB200 GPUs for enhanced computational capacity. This expansion supports the company's growing AI model training and deployment needs.
Intuit is laying off over 3,000 employees as part of a strategic refocus on artificial intelligence and automation capabilities. The restructuring aims to shift company resources toward AI-driven products and services.
Waymo has paused its robotaxi service in Atlanta after multiple incidents where its autonomous vehicles drove into flooded areas. The pause highlights safety challenges in handling unexpected weather conditions and dynamic road hazards.
Cloudflare's CEO discusses his decision-making process for identifying and replacing employees with AI systems, offering insights into corporate automation strategies. The discussion reflects broader industry trends of using AI to augment or substitute human workforce roles.