Cainew

Curated AI news for developers

TL;DR

Model Releases

GLM-5.2 is a language model release from zai-org that represents advances in general-purpose language understanding and generation capabilities.

HuggingFace

SubQ 1.1 Small is a new compact version of the SubQ model offering improved efficiency for smaller-scale deployments.

RSS

Tools & Products

Your AI forgets. This remembers. Spec-driven coding harness for vibecoders, product owners, CEOs and real builders — self-improving context memory, 12 agents, 32 skills. Kills context rot, ships features, not spaghetti. Claude Code & Codex. Any stack. 30 seconds

GitHub

Most AI tools make you explain the context before they can help. Goldfish already has it. It privately remembers what you’ve been working on across your Mac, then helps you write better from any app. Press Option in a text field to draft replies, summarize threads, rewrite sentences, or recall important details from your recent work without copying, pasting, or re-explaining the whole backstory.

ProductHunt

Expert-thinking AGENTS.md profiles that teach AI agents to reason like senior scientists and engineers.

GitHub

Invoko is an AI desktop helper you can talk to while you work. Bring it beside anything on your screen, ask it questions, or let it handle tasks across your apps.

ProductHunt

Sluggish, bloated, legacy support tools are dead. Zoona is support for modern teams — it learns from your docs and past conversations, then resolves 60%+ of tickets the second they land. No backlog. No burnout. No endless hiring to keep up. When it does need a human, it hands off with full context so the customer never repeats themselves. This is support that scales with you, not against you. Train it, go live, done.

ProductHunt

GitHits gives coding agents access to the open-source code your app depends on. Get real implementation examples, dependency source navigation, package inspection and documentation. Agents can grep and read your codebase. They can't grep and read the open-source code your app depends on. That's where they start guessing, retrying, and looping. GitHits builds a version-aware index on demand. Agents can search, navigate, and inspect the code behind their dependencies. CLI: npx githits@latest init

ProductHunt

Stride is the AI-native workspace for the whole build: plan, design, verify, and ship. Its AI works inside your real project data and plugs into Claude Code and Codex over MCP, so it does the work instead of just talking about it. Your team goes from idea to launch without switching tools.

ProductHunt

Founders lose hours everyday to doing email, when they should be spending the time to build and make real progress. Dirac was made to end that. Dirac is an AI-native inbox that scans your threads, drafts replies in your voice, and shows a brief with only what needs your decision, quietly dealing with the 80% of un-important emails in the background. You run your inbox by deciding, not being your own assitant.

ProductHunt

How do you feel? It is the oldest question in art and the newest one we can answer in technology. MindReader takes your content and simulates, region by region, how a brain responds to it. Completely Open Source - we encourage you to tinker. Exploring sales evals, neural evals for datasets and other esoteric product experiments w/ madhat founders. MindReader is built on Meta FAIR's TRIBE v2 + 35yrs of neuro research. Inviting collab from the academics et all.

ProductHunt

Research Papers

This technical report introduces VibeThinker-3B, a compact dense model with 3B parameters developed to investigate how far verifiable reasoning can be pushed within a strictly small-model regime. Building upon the Spectrum-to-Signal post-training paradigm, we systematically enhance the model through an optimized pipeline that includes curriculum-based supervised fine-tuning, multi-domain reinforcement learning, and offline self-distillation. Experimental evaluations demonstrate that VibeThinker-...

HuggingFace

We introduce TuneJury, an open, instance-level pairwise reward model for text-to-music that predicts a music preference score from a text prompt and an audio clip. The released checkpoint is trained on publicly available human-preference labels covering arena-style (A vs. B) votes, metric-alignment preference pairs, crowdsourced pairwise comparisons, and expert aesthetic ratings. The predicted score margin between two clips is well calibrated on our held-out test split, supporting data filtering...

HuggingFace

Sparse reward reinforcement learning (RL) has become a standard tool for improving LLM reasoning, but its success depends critically on the coverage present in the base model. In practice, models are often primed for RL through mid-training on curated reasoning traces that teach useful primitive skills such as decomposition, verification, or self-correction. Although effective, this strategy requires manually specifying what the model should learn, and it remains unclear whether such primitive c...

HuggingFace

DreamX-World 1.0 is a general-purpose interactive text/image-to-video world model for controllable long-horizon generation. It supports camera navigation, revisits to previously observed regions, and promptable events across photorealistic, game-style, and stylized domains. Our data engine combines camera-accurate Unreal Engine rendering, action-rich gameplay recordings, and real-world videos with recovered camera geometry. For camera control, we introduce E-PRoPE, a lightweight variant of proje...

HuggingFace

Multi-task learning (MTL) is essential in recommender systems to enable complementary learning among diverse user feedback. While modern industrial practices have shifted from DNNs to Transformer-centric architectures to strengthen sequence modeling and scaling capacity, they still decouple feature encoding from multi-task prediction, treating the Transformer as a task-agnostic encoder. This design fundamentally limits the performance and scalability by (1) creating an information bottleneck und...

HuggingFace

Masked Diffusion Language Models (MDLMs) have emerged as a distinct paradigm for sequence generation. As MDLMs become diverse in capabilities and knowledge coverage, an important question is how to combine their knowledge. Toward this, we first investigate the unique decoding dynamics of MDLMs. We find that successful generations exhibit stable confidence dynamics over answer-relevant positions, while unreliable trajectories can often be corrected by injecting promising intermediate states from ...

HuggingFace

Visual world models (VWMs) synthesize interactive, action-conditioned rollouts from a single context image. However, it remains an open question how robust these models are to adversarial perturbations. Standard adversarial attacks fail to assess this vulnerability because attackers lack ground-truth future videos and cannot predict subsequent user controls. We introduce BadWorld, a label-free adversarial framework tailored for autoregressive VWMs that systematically overcomes both constraints. ...

HuggingFace

As LLMs advance, post-training reinforcement learning (RL) increasingly relies on multi-dimensional rewards to cultivate comprehensive capabilities. This shift demands new algorithms capable of optimizing diverse and potentially competing objectives simultaneously. To address this, existing methods such as Group reward-Decoupled Policy Optimization (GDPO) decompose the overall score into independent reward groups, then compute the RL loss separately within each group. However, this strategy stil...

HuggingFace

Consistent video generation under editing operations requires persistence: when edits modify scene appearance or layout, subsequent generations should remain coherent across time and viewpoints. However, existing memory designs struggle to maintain long-term consistency after such modifications, as stored contexts may become outdated or invalid. To address this, we propose PermaVid, a novel framework built upon a multi-modal context memory that disentangles spatial context into semantic appearan...

HuggingFace

Multi-turn LLM serving accumulates dialogue history whose Key-Value (KV) cache grows with every turn and every user, quickly exceeding the model weights themselves and making memory -- not compute -- the binding constraint on throughput. Non-uniform KV compression, which allocates heterogeneous budgets across attention heads, preserves accuracy far better than uniform schemes, yet remains impractical: modern serving stacks assume identical KV lengths across heads, so heterogeneity traps freed me...

HuggingFace

Vision language models are serving as general-purpose interfaces for complex multimodal tasks. However, deployment still faces three gaps: VLMs typically incur high latency and cost when processing dense video frames and long prompts, the agent scaffold remains static after deployment, and standard video-QA benchmarks do not test whether agents can use visual evidence inside tool-using workspaces. We present VisualClaw, a self-evolving multimodal agent built around two principles. First, hybrid ...

HuggingFace

In this paper, we introduce SP^3, a novel Plug-and-Play algorithm that accelerates maximum a posteriori image restoration by replacing denoisers with Spherical Encoders (SE) as generative priors. SP^3 approximates the intractable proximal prior step by utilizing the SE tightly structured latent space as a robust projection onto the natural image manifold. Alternating this projection with a closed-form data-consistency step, via Half-Quadratic Splitting, achieves stable convergence without requir...

HuggingFace

Humans can grasp objects effortlessly, whereas multi-fingered robots are far from this level of generality. We argue that the most natural source of robot grasping data is from humans, who pick up thousands of objects every day. We present HUG, a flow-matching model that generates diverse human grasps for any user-specified object in a single RGB-D image captured from a stereo camera. Using smart glasses, we first collect 1M-HUGs, an egocentric dataset of human grasps spanning 1M frames (27.8 hr...

HuggingFace

Large Language Models (LLMs) are increasingly adopted as backbones for Generative Recommendation (GR), promising access to pretrained world knowledge. Yet reliably invoking this knowledge for GR remains poorly understood. A key obstacle is that LLM-based GR typically represents items with Semantic IDs (SIDs), disrupting LLMs' natural-language reasoning interface because these tokens are unseen by the LLM during pretraining. Existing approaches address this with expensive multi-stage pipelines th...

HuggingFace

Humans naturally understand object physics through everyday interactions, but faithfully predicting complex deformable dynamics, such as elastic materials and fabrics, remains a major challenge for computer vision and robotics. We present EgoPhys, a framework that constructs deformable physical digital twins from egocentric RGB-only video using generalizable priors. EgoPhys overcomes the limitations of existing methods to enable controllable deformable digital twin generation from egocentric vid...

HuggingFace

Industry News

SpaceX announced its acquisition of Cursor, a popular AI-powered code editor, for $60 billion as part of its expansion into software and AI development tools.

RSS

Discussion

An article detailing the author's decision to discontinue using Google services, likely exploring alternative platforms and tools.

RSS