GLM-5.2 is a language model release from zai-org that represents advances in general-purpose language understanding and generation capabilities.
TL;DR
Model Releases
Tools & Products
Research Papers
Industry News
Model Releases
SubQ 1.1 Small is a new compact version of the SubQ model offering improved efficiency for smaller-scale deployments.
Qwen-Robot Suite introduces a comprehensive foundation model suite designed to enhance physical world intelligence and robotics capabilities.
Tools & Products
Your AI forgets. This remembers. Spec-driven coding harness for vibecoders, product owners, CEOs and real builders — self-improving context memory, 12 agents, 32 skills. Kills context rot, ships features, not spaghetti. Claude Code & Codex. Any stack. 30 seconds
Harness coding workflow for codex, claude, github copilot
Most AI tools make you explain the context before they can help. Goldfish already has it. It privately remembers what you’ve been working on across your Mac, then helps you write better from any app. Press Option in a text field to draft replies, summarize threads, rewrite sentences, or recall important details from your recent work without copying, pasting, or re-explaining the whole backstory.
Expert-thinking AGENTS.md profiles that teach AI agents to reason like senior scientists and engineers.
Invoko is an AI desktop helper you can talk to while you work. Bring it beside anything on your screen, ask it questions, or let it handle tasks across your apps.
Hire AI employees that run 24/7 in their own container with their own memory. One-click into your Slack, Telegram, or Teams. Pre-built for support, sales, research, SEO, or anything you write yourself. Pay per call for the tools they use.
Run state-of-the-art open-source models (GLM 5.1, Kimi K2.7 Code, MiniMax M2.7, and more) in Claude Code at up to 4× the speed (up to 200 tok/s) for a flat $29/month. Set up in minutes, no code changes.
Sluggish, bloated, legacy support tools are dead. Zoona is support for modern teams — it learns from your docs and past conversations, then resolves 60%+ of tickets the second they land. No backlog. No burnout. No endless hiring to keep up. When it does need a human, it hands off with full context so the customer never repeats themselves. This is support that scales with you, not against you. Train it, go live, done.
GitHits gives coding agents access to the open-source code your app depends on. Get real implementation examples, dependency source navigation, package inspection and documentation. Agents can grep and read your codebase. They can't grep and read the open-source code your app depends on. That's where they start guessing, retrying, and looping. GitHits builds a version-aware index on demand. Agents can search, navigate, and inspect the code behind their dependencies. CLI: npx githits@latest init
Stride is the AI-native workspace for the whole build: plan, design, verify, and ship. Its AI works inside your real project data and plugs into Claude Code and Codex over MCP, so it does the work instead of just talking about it. Your team goes from idea to launch without switching tools.
Founders lose hours everyday to doing email, when they should be spending the time to build and make real progress. Dirac was made to end that. Dirac is an AI-native inbox that scans your threads, drafts replies in your voice, and shows a brief with only what needs your decision, quietly dealing with the 80% of un-important emails in the background. You run your inbox by deciding, not being your own assitant.
How do you feel? It is the oldest question in art and the newest one we can answer in technology. MindReader takes your content and simulates, region by region, how a brain responds to it. Completely Open Source - we encourage you to tinker. Exploring sales evals, neural evals for datasets and other esoteric product experiments w/ madhat founders. MindReader is built on Meta FAIR's TRIBE v2 + 35yrs of neuro research. Inviting collab from the academics et all.
Research Papers
This technical report introduces VibeThinker-3B, a compact dense model with 3B parameters developed to investigate how far verifiable reasoning can be pushed within a strictly small-model regime. Building upon the Spectrum-to-Signal post-training paradigm, we systematically enhance the model through an optimized pipeline that includes curriculum-based supervised fine-tuning, multi-domain reinforcement learning, and offline self-distillation. Experimental evaluations demonstrate that VibeThinker-...
We introduce TuneJury, an open, instance-level pairwise reward model for text-to-music that predicts a music preference score from a text prompt and an audio clip. The released checkpoint is trained on publicly available human-preference labels covering arena-style (A vs. B) votes, metric-alignment preference pairs, crowdsourced pairwise comparisons, and expert aesthetic ratings. The predicted score margin between two clips is well calibrated on our held-out test split, supporting data filtering...
Sparse reward reinforcement learning (RL) has become a standard tool for improving LLM reasoning, but its success depends critically on the coverage present in the base model. In practice, models are often primed for RL through mid-training on curated reasoning traces that teach useful primitive skills such as decomposition, verification, or self-correction. Although effective, this strategy requires manually specifying what the model should learn, and it remains unclear whether such primitive c...
DreamX-World 1.0 is a general-purpose interactive text/image-to-video world model for controllable long-horizon generation. It supports camera navigation, revisits to previously observed regions, and promptable events across photorealistic, game-style, and stylized domains. Our data engine combines camera-accurate Unreal Engine rendering, action-rich gameplay recordings, and real-world videos with recovered camera geometry. For camera control, we introduce E-PRoPE, a lightweight variant of proje...
Multi-task learning (MTL) is essential in recommender systems to enable complementary learning among diverse user feedback. While modern industrial practices have shifted from DNNs to Transformer-centric architectures to strengthen sequence modeling and scaling capacity, they still decouple feature encoding from multi-task prediction, treating the Transformer as a task-agnostic encoder. This design fundamentally limits the performance and scalability by (1) creating an information bottleneck und...
Masked Diffusion Language Models (MDLMs) have emerged as a distinct paradigm for sequence generation. As MDLMs become diverse in capabilities and knowledge coverage, an important question is how to combine their knowledge. Toward this, we first investigate the unique decoding dynamics of MDLMs. We find that successful generations exhibit stable confidence dynamics over answer-relevant positions, while unreliable trajectories can often be corrected by injecting promising intermediate states from ...
Visual world models (VWMs) synthesize interactive, action-conditioned rollouts from a single context image. However, it remains an open question how robust these models are to adversarial perturbations. Standard adversarial attacks fail to assess this vulnerability because attackers lack ground-truth future videos and cannot predict subsequent user controls. We introduce BadWorld, a label-free adversarial framework tailored for autoregressive VWMs that systematically overcomes both constraints. ...
As LLMs advance, post-training reinforcement learning (RL) increasingly relies on multi-dimensional rewards to cultivate comprehensive capabilities. This shift demands new algorithms capable of optimizing diverse and potentially competing objectives simultaneously. To address this, existing methods such as Group reward-Decoupled Policy Optimization (GDPO) decompose the overall score into independent reward groups, then compute the RL loss separately within each group. However, this strategy stil...
Consistent video generation under editing operations requires persistence: when edits modify scene appearance or layout, subsequent generations should remain coherent across time and viewpoints. However, existing memory designs struggle to maintain long-term consistency after such modifications, as stored contexts may become outdated or invalid. To address this, we propose PermaVid, a novel framework built upon a multi-modal context memory that disentangles spatial context into semantic appearan...
Multi-turn LLM serving accumulates dialogue history whose Key-Value (KV) cache grows with every turn and every user, quickly exceeding the model weights themselves and making memory -- not compute -- the binding constraint on throughput. Non-uniform KV compression, which allocates heterogeneous budgets across attention heads, preserves accuracy far better than uniform schemes, yet remains impractical: modern serving stacks assume identical KV lengths across heads, so heterogeneity traps freed me...
Vision language models are serving as general-purpose interfaces for complex multimodal tasks. However, deployment still faces three gaps: VLMs typically incur high latency and cost when processing dense video frames and long prompts, the agent scaffold remains static after deployment, and standard video-QA benchmarks do not test whether agents can use visual evidence inside tool-using workspaces. We present VisualClaw, a self-evolving multimodal agent built around two principles. First, hybrid ...
In this paper, we introduce SP^3, a novel Plug-and-Play algorithm that accelerates maximum a posteriori image restoration by replacing denoisers with Spherical Encoders (SE) as generative priors. SP^3 approximates the intractable proximal prior step by utilizing the SE tightly structured latent space as a robust projection onto the natural image manifold. Alternating this projection with a closed-form data-consistency step, via Half-Quadratic Splitting, achieves stable convergence without requir...
Humans can grasp objects effortlessly, whereas multi-fingered robots are far from this level of generality. We argue that the most natural source of robot grasping data is from humans, who pick up thousands of objects every day. We present HUG, a flow-matching model that generates diverse human grasps for any user-specified object in a single RGB-D image captured from a stereo camera. Using smart glasses, we first collect 1M-HUGs, an egocentric dataset of human grasps spanning 1M frames (27.8 hr...
Large Language Models (LLMs) are increasingly adopted as backbones for Generative Recommendation (GR), promising access to pretrained world knowledge. Yet reliably invoking this knowledge for GR remains poorly understood. A key obstacle is that LLM-based GR typically represents items with Semantic IDs (SIDs), disrupting LLMs' natural-language reasoning interface because these tokens are unseen by the LLM during pretraining. Existing approaches address this with expensive multi-stage pipelines th...
Humans naturally understand object physics through everyday interactions, but faithfully predicting complex deformable dynamics, such as elastic materials and fabrics, remains a major challenge for computer vision and robotics. We present EgoPhys, a framework that constructs deformable physical digital twins from egocentric RGB-only video using generalizable priors. EgoPhys overcomes the limitations of existing methods to enable controllable deformable digital twin generation from egocentric vid...
Industry News
OpenAI's losses nearly 8x'd in 2025 with annual spending reaching $34 billion, reflecting the company's aggressive expansion and investment in AI development.
Anthropic's Claude model has been experiencing elevated error rates across multiple versions, affecting reliability for users relying on the AI assistant.
SpaceX announced its acquisition of Cursor, a popular AI-powered code editor, for $60 billion as part of its expansion into software and AI development tools.
Amazon announced a multibillion-dollar investment in a new data center facility in Missouri to support growing cloud computing and AI infrastructure demands.
Discussion
An article detailing the author's decision to discontinue using Google services, likely exploring alternative platforms and tools.
As AI code reviews have become more expensive to conduct, rewrites and automated fixes have become comparatively cheaper, shifting development cost dynamics.