Antigravity 2.0 has achieved the top score on the OpenSCAD Architectural 3D LLM Benchmark, demonstrating superior performance in handling complex 3D architectural design tasks. This advancement shows significant progress in specialized AI model capabilities for technical design applications.
TL;DR
Tools & Products
Research Papers
Industry News
Model Releases
Tools & Products
β¨ The agentic HTML editor β your local AI agent writes the HTML, you ship it. π 75 Skills Γ 9 Surfaces (magazine Β· deck Β· poster Β· XHS / tweet Β· prototype Β· data report Β· Hyperframes) π‘οΈ Sandboxed preview Β· π€ 1-click to WeChat / X / Zhihu / HTML / PNG π Zero API key β Claude Code / Cursor / Codex / Gemini / Copilot / OpenCode / Qwen / Aider.
Hyper efficient storage for GPU workloads. Feed your GPUs at blazing fast speeds.
Open-source memory runtime for AI agents β reproducible, provenance-tagged context bundles instead of query-time retrieval. Apache-2.0, self-hosted on Postgres + pgvector, Python + TypeScript SDKs.
MCP server that bridges clients to a real browser through CDP and a companion extension.
TestSprite generates and runs end-to-end tests for your app, autonomously. For backend, we can now generate complex integration tests with dynamic variables, auto-cleanup, and Data Flow debugging. For frontend, we now send a fleet of parallel AI agents to explore your app first β clicking through every feature like real users, then feeding results into testing. We're the first to do this. 3.0 also adds auto-heal for UI drift, auto-auth for regression, and a CLI for Claude Code, Codex users.
A lightweight, high-efficiency desktop Agent assistant with multi-task parallel capability.
Cleo is the AI product manager for founders and lean teams. It lives in Telegram and Slack - learns your tone, knows your team, and runs the PM work (standups, follow-ups, decisions) while you ship the product. What's different: every fact Cleo learns is transparent - you see the source, the confidence, and can confirm or correct it. No black-box memory. Five trust levels, from observer to operator. Free in Telegram. 1 min setup
gemini antigravity 2.0 cli google terminal ai agent tool agy migration guide mcp server plugin slash commands gemini 3.5 flash coding agent tool free
Unofficial MIT-licensed iOS companion for Claude Code: self-hosted relay, local-first chat, search, and session control from your iPhone. Not affiliated with Anthropic.
Claude Cowork/Code plugin for Customer Success
Nugget AI turns customer interviews into product evidence. Record or upload calls β AI extracts pain points and feature requests β synthesis surfaces themes β auto-generated PRDs with real customer quotes β dev-ready handoff to Linear & GitHub. NEW: MCP server. Connect Claude, ChatGPT, Cursor, or Codex β your AI agent can search every interview and draft specs grounded in real evidence. No more copy-pasting. Half the price of Dovetail.
You've spent years creatingβnewsletters, podcasts, LinkedIn posts, courses. The book is in there. You just haven't had time to assemble it. Prosed's Inkwell pipeline analyzes your voice, structures your scattered content into chapters, and produces a manuscript that actually sounds like you. Not generic AI writing or slop. Your words, your ideas, assembled into something real. Built-in editorial review. Print-ready PDF/DOCX export. Beta: $47 for the first 100 founders.
Reader Alive is an AI ebook reader for iPhone and iPad. Import EPUB, PDF, AZW3, and MOBI, then translate chapters, listen with natural text-to-speech, summarize dense sections, and ask questions grounded in the book.
Research Papers
A new paper on Multi-Stream LLMs explores methods for parallelizing and separating prompts, thinking processes, and input/output operations within large language models. This approach aims to improve efficiency and throughput in LLM processing.
CODA presents a novel approach to optimizing transformer blocks by rewriting them as GEMM-Epilogue programs, improving computational efficiency. This technique aims to enhance the performance of transformer-based models through hardware-friendly optimization.
Linear attention replaces the unbounded cache of softmax attention with a fixed-size recurrent state, reducing sequence mixing to linear time and decoding to constant memory. The hard part is not just what to forget, but how to edit this compressed memory without scrambling existing associations. Delta-rule models subtract the current read before writing a new value, and Kimi Delta Attention (KDA) sharpens forgetting with channel-wise decay. But the active edit still uses a single scalar gate to...
Multimodal large language models (MLLMs) and diffusion models have each reached remarkable maturity: MLLMs excel at reasoning over heterogeneous multimodal inputs with strong semantic grounding, while diffusion models synthesize images and videos with photorealistic fidelity. We argue that these two families can be unified through a simple division of labor: MLLMs perform semantic planning, while diffusion models render pixels from high-level semantic guidance and low-level visual features. Buil...
Robust training and validation of Autonomous Driving Systems (ADS) require massive, diverse datasets. Proprietary data collected by Autonomous Vehicle (AV) fleets, while high-fidelity, are limited in scale, diversity of sensor configurations, as well as geographic and long-tail-behavioral coverage. In contrast, in-the-wild data from sources like dashcams offers immense scale and diversity, capturing critical long-tail scenarios and novel environments. However, this unstructured, in-the-wild vide...
Multimodal Large Language Models (MLLMs) are increasingly deployed in human-facing roles where personality perception is critical, yet existing benchmarks evaluate this capability solely on numerical Big Five score prediction, leaving open whether models truly perceive personality through behavioral understanding or merely prejudge through superficial pattern matching. We address this gap with three contributions. (i) A new task: we formalize Grounded Personality Reasoning (GPR), which requires ...
Recent development of agents has renewed demand for long-context reasoning capacity of LLMs. However, training LLMs for this capacity requires costly long-document curation or heuristic context synthesis. We observe that agents produce massive trajectories when solving problems, invoking tools and receiving environment observations across many turns. The evidence needed to answer the original question is thus scattered throughout these turns, requiring integration of distant context segments. Ne...
Model cards describe model behavior through a mixture of textual descriptions and structured artifacts, including performance, configuration, and dataset tables. Existing model search systems rely predominantly on semantic similarity over text, which can produce homogeneous result sets and limit exploration of alternatives. We argue that model search is inherently comparative: users want models that are task-aligned yet differentiated in measurable ways. We hypothesize that this balance requires...
Joint audio-visual reasoning is essential for omnimodal understanding, yet current multimodal large language models (MLLMs) still struggle when reasoning requires fine-grained evidence from both modalities. A central limitation is that explicit text-based chain-of-thought (CoT) compresses continuous audio-visual signals into discrete tokens, weakening temporal grounding and shifting intermediate reasoning toward language priors. We argue that a unified latent space is a better medium for such re...
Detecting Schwartz values in political text is difficult because implicit cues often depend on surrounding arguments and fine-grained distinctions between neighboring values. We study when context and explicit moral knowledge help sentence-level value detection. Using the ValuesML/Touch{Γ©} ValueEval format, we compare sentence, window, and full-document inputs; no-RAG and retrieval-augmented settings with a curated moral knowledge base; supervised DeBERTa-v3-base/large encoders; and zero-shot L...
Representation Autoencoders (RAEs) leverage frozen vision foundation models (VFMs) as tokenizer encoders, providing robust high-level representations that facilitate fast convergence and high-quality generation in latent diffusion models. However, freezing the VFM inherently constrains its spatial reconstruction capacity, limiting fine-grained generation and image editing; in contrast, incorporating reconstruction-oriented signals via fine-tuning disrupts the pretrained semantic space and degrad...
Flow matching with x-prediction -- regressing the clean data point rather than the ambient velocity -- is known to exploit low-dimensional manifold structure effectively in pixel space li2025back. We ask whether a pretrained representation space, while containing a low-dimensional data manifold of comparable intrinsic dimensionality, offers a distribution more favorable for flow-matching learning. Comparing pixel, SD-VAE, and DINOv2 features along four geometric axes, we find that pixel and DINO...
Fashion image retrieval is a cornerstone of modern e-commerce systems. A unified framework that supports diverse query formats and search intentions is highly desired in practice. However, existing approaches focus on narrow retrieval tasks and do not fully capture such diversity. Therefore, in this work, we aim to develop a unified framework capable of handling diverse realistic fashion retrieval scenarios, achieving truly versatile fashion image retrieval. To establish a data foundation, we fi...
Public transit route planning traditionally depends on structured map infrastructure and complex routing engines, and no existing dataset supports training models to bypass this dependency. We present TransitLM, a large-scale dataset of over 13 million transit route planning records from four Chinese cities covering 120,845 stations and 13,666 lines, released as a continual pre-training corpus and benchmark data for three evaluation tasks with complementary metrics. Experiments show that an LLM ...
How should an agent decide when and how to plan? A dominant approach builds agents as reactive policies with adaptive computation (e.g., chain-of-thought), trained end-to-end expecting planning to emerge implicitly. Without control over the presence, structure, or horizon of planning, these systems dramatically increase reasoning length, yielding inefficient token use without reliable accuracy gains. We argue efficient agentic reasoning benefits from decomposing decision-making into three system...
Industry News
DeepSeek has made its V4 Pro discount permanent, offering customers an ongoing price reduction on the model. This reflects the company's commitment to keeping advanced AI capabilities more affordable and accessible.
A recap of the 2026 I/O Dialogues, where leaders discuss the future of AI, quantum computing, robotics and creativity.
OpenAI is named a leader in the 2026 Gartner Magic Quadrant for Enterprise AI Coding Agents, with Codex recognized for innovation and enterprise-scale deployment.
Discussion
The current AI pricing model is expected to be temporary, with prices likely to decrease significantly as the market matures and competition increases. This reflects broader economic trends where specialized AI services will eventually become commoditized.