Cainew - Curated AI news for developers

TL;DR

Model Releases

Tools & Products

Research Papers

Industry News

Discussion

Model Releases

GLM-5.2 is a language model release from zai-org that represents advances in general-purpose language understanding and generation capabilities.

HuggingFace

SubQ 1.1 Small

SubQ 1.1 Small is a new compact version of the SubQ model offering improved efficiency for smaller-scale deployments.

RSS

Qwen-Robot Suite: A Foundation Model Suite for Physical World Intelligence

Qwen-Robot Suite introduces a comprehensive foundation model suite designed to enhance physical world intelligence and robotics capabilities.

RSS

Tools & Products

withkynam/vibecode-pro-max-kit

Your AI forgets. This remembers. Spec-driven coding harness for vibecoders, product owners, CEOs and real builders — self-improving context memory, 12 agents, 32 skills. Kills context rot, ships features, not spaghetti. Claude Code & Codex. Any stack. 30 seconds

GitHub

HangYu8123/HarnessFlow

Harness coding workflow for codex, claude, github copilot

GitHub

Goldfish: Press Option. It knows your work and replies like you

Most AI tools make you explain the context before they can help. Goldfish already has it. It privately remembers what you’ve been working on across your Mac, then helps you write better from any app. Press Option in a text field to draft replies, summarize threads, rewrite sentences, or recall important details from your recent work without copying, pasting, or re-explaining the whole backstory.

ProductHunt

K-Dense-AI/scientific-agents

Expert-thinking AGENTS.md profiles that teach AI agents to reason like senior scientists and engineers.

GitHub

Invoko: A little hand on your Mac

Invoko is an AI desktop helper you can talk to while you work. Bring it beside anything on your screen, ask it questions, or let it handle tasks across your apps.

ProductHunt

MakersClaw: Hire AI employees that live in your Slack, Teams, Telegram

Hire AI employees that run 24/7 in their own container with their own memory. One-click into your Slack, Telegram, or Teams. Pre-built for support, sales, research, SEO, or anything you write yourself. Pay per call for the tools they use.

ProductHunt

Edgee Turbo Models: Use Claude Code with Kimi K2.7 Code, MiniMax M2.7, and more

Run state-of-the-art open-source models (GLM 5.1, Kimi K2.7 Code, MiniMax M2.7, and more) in Claude Code at up to 4× the speed (up to 200 tok/s) for a flat $29/month. Set up in minutes, no code changes.

ProductHunt

Zoona AI: Automated support that learns from docs + past conversations

Sluggish, bloated, legacy support tools are dead. Zoona is support for modern teams — it learns from your docs and past conversations, then resolves 60%+ of tickets the second they land. No backlog. No burnout. No endless hiring to keep up. When it does need a human, it hands off with full context so the customer never repeats themselves. This is support that scales with you, not against you. Train it, go live, done.

ProductHunt

GitHits beta 0.9: Give your AI coding agent access to open-source code

GitHits gives coding agents access to the open-source code your app depends on. Get real implementation examples, dependency source navigation, package inspection and documentation. Agents can grep and read your codebase. They can't grep and read the open-source code your app depends on. That's where they start guessing, retrying, and looping. GitHits builds a version-aware index on demand. Agents can search, navigate, and inspect the code behind their dependencies. CLI: npx githits@latest init

ProductHunt

Stride: The AI workspace that plans, designs and ships with you.

Stride is the AI-native workspace for the whole build: plan, design, verify, and ship. Its AI works inside your real project data and plugs into Claude Code and Codex over MCP, so it does the work instead of just talking about it. Your team goes from idea to launch without switching tools.

ProductHunt

Dirac: The AI inbox that briefs founders every morning

Founders lose hours everyday to doing email, when they should be spending the time to build and make real progress. Dirac was made to end that. Dirac is an AI-native inbox that scans your threads, drafts replies in your voice, and shows a brief with only what needs your decision, quietly dealing with the 80% of un-important emails in the background. You run your inbox by deciding, not being your own assitant.

ProductHunt

MindReader v1: Read minds (simulated fMRI data, channeled to neuro-metrics)

How do you feel? It is the oldest question in art and the newest one we can answer in technology. MindReader takes your content and simulates, region by region, how a brain responds to it. Completely Open Source - we encourage you to tinker. Exploring sales evals, neural evals for datasets and other esoteric product experiments w/ madhat founders. MindReader is built on Meta FAIR's TRIBE v2 + 35yrs of neuro research. Inviting collab from the academics et all.

ProductHunt

Research Papers

VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models

This technical report introduces VibeThinker-3B, a compact dense model with 3B parameters developed to investigate how far verifiable reasoning can be pushed within a strictly small-model regime. Building upon the Spectrum-to-Signal post-training paradigm, we systematically enhance the model through an optimized pipeline that includes curriculum-based supervised fine-tuning, multi-domain reinforcement learning, and offline self-distillation. Experimental evaluations demonstrate that VibeThinker-...

HuggingFace

TuneJury: An Open Metric for Improving Music Generation Preference Alignment

We introduce TuneJury, an open, instance-level pairwise reward model for text-to-music that predicts a music preference score from a text prompt and an audio clip. The released checkpoint is trained on publicly available human-preference labels covering arena-style (A vs. B) votes, metric-alignment preference pairs, crowdsourced pairwise comparisons, and expert aesthetic ratings. The predicted score margin between two clips is well calibrated on our held-out test split, supporting data filtering...

HuggingFace

ExpRL: Exploratory RL for LLM Mid-Training

Sparse reward reinforcement learning (RL) has become a standard tool for improving LLM reasoning, but its success depends critically on the coverage present in the base model. In practice, models are often primed for RL through mid-training on curated reasoning traces that teach useful primitive skills such as decomposition, verification, or self-correction. Although effective, this strategy requires manually specifying what the model should learn, and it remains unclear whether such primitive c...

HuggingFace

DreamX-World 1.0: A General-Purpose Interactive World Model

DreamX-World 1.0 is a general-purpose interactive text/image-to-video world model for controllable long-horizon generation. It supports camera navigation, revisits to previously observed regions, and promptable events across photorealistic, game-style, and stylized domains. Our data engine combines camera-accurate Unreal Engine rendering, action-rich gameplay recordings, and real-world videos with recovered camera geometry. For camera control, we introduce E-PRoPE, a lightweight variant of proje...

HuggingFace

OneRank: Unified Transformer-Native Ranking Architecture for Multi-Task Recommendation

Multi-task learning (MTL) is essential in recommender systems to enable complementary learning among diverse user feedback. While modern industrial practices have shifted from DNNs to Transformer-centric architectures to strengthen sequence modeling and scaling capacity, they still decouple feature encoding from multi-task prediction, treating the Transformer as a task-agnostic encoder. This design fundamentally limits the performance and scalability by (1) creating an information bottleneck und...

HuggingFace

Who Should Lead Decoding Now? Tracking Reliable Trajectories for Ensembling Masked Diffusion Language Models

Masked Diffusion Language Models (MDLMs) have emerged as a distinct paradigm for sequence generation. As MDLMs become diverse in capabilities and knowledge coverage, an important question is how to combine their knowledge. Toward this, we first investigate the unique decoding dynamics of MDLMs. We find that successful generations exhibit stable confidence dynamics over answer-relevant positions, while unreliable trajectories can often be corrected by injecting promising intermediate states from ...

HuggingFace

BadWorld: Adversarial Attacks on World Models

Visual world models (VWMs) synthesize interactive, action-conditioned rollouts from a single context image. However, it remains an open question how robust these models are to adversarial perturbations. Standard adversarial attacks fail to assess this vulnerability because attackers lack ground-truth future videos and cannot predict subsequent user controls. We introduce BadWorld, a label-free adversarial framework tailored for autoregressive VWMs that systematically overcomes both constraints. ...

HuggingFace

GD^2PO: Mitigating Multi-Reward Conflicts via Group-Dynamic reward-Decoupled Policy Optimization

As LLMs advance, post-training reinforcement learning (RL) increasingly relies on multi-dimensional rewards to cultivate comprehensive capabilities. This shift demands new algorithms capable of optimizing diverse and potentially competing objectives simultaneously. To address this, existing methods such as Group reward-Decoupled Policy Optimization (GDPO) decompose the overall score into independent reward groups, then compute the RL loss separately within each group. However, this strategy stil...

HuggingFace

PermaVid: Consistent Video Generation Across Edits via Disentangled Context Memory

Consistent video generation under editing operations requires persistence: when edits modify scene appearance or layout, subsequent generations should remain coherent across time and viewpoints. However, existing memory designs struggle to maintain long-term consistency after such modifications, as stored contexts may become outdated or invalid. To address this, we propose PermaVid, a novel framework built upon a multi-modal context memory that disentangles spatial context into semantic appearan...

HuggingFace

Tangram: Unlocking Non-Uniform KV Cache Compression for Efficient Multi-turn LLM Serving

Multi-turn LLM serving accumulates dialogue history whose Key-Value (KV) cache grows with every turn and every user, quickly exceeding the model weights themselves and making memory -- not compute -- the binding constraint on throughput. Non-uniform KV compression, which allocates heterogeneous budgets across attention heads, preserves accuracy far better than uniform schemes, yet remains impractical: modern serving stacks assume identical KV lengths across heads, so heterogeneity traps freed me...

HuggingFace

VisualClaw: A Real-Time, Personalized Agent for the Physical World

Vision language models are serving as general-purpose interfaces for complex multimodal tasks. However, deployment still faces three gaps: VLMs typically incur high latency and cost when processing dense video frames and long prompts, the agent scaffold remains static after deployment, and standard video-QA benchmarks do not test whether agents can use visual evidence inside tool-using workspaces. We present VisualClaw, a self-evolving multimodal agent built around two principles. First, hybrid ...

HuggingFace

SP^3: Spherical Priors for Plug-and-Play Restoration

In this paper, we introduce SP^3, a novel Plug-and-Play algorithm that accelerates maximum a posteriori image restoration by replacing denoisers with Spherical Encoders (SE) as generative priors. SP^3 approximates the intractable proximal prior step by utilizing the SE tightly structured latent space as a robust projection onto the natural image manifold. Alternating this projection with a closed-form data-consistency step, via Half-Quadratic Splitting, achieves stable convergence without requir...

HuggingFace

Human Universal Grasping

Humans can grasp objects effortlessly, whereas multi-fingered robots are far from this level of generality. We argue that the most natural source of robot grasping data is from humans, who pick up thousands of objects every day. We present HUG, a flow-matching model that generates diverse human grasps for any user-specified object in a single RGB-D image captured from a stereo camera. Using smart glasses, we first collect 1M-HUGs, an egocentric dataset of human grasps spanning 1M frames (27.8 hr...

HuggingFace

Implicit Reasoning for Large Language Model-based Generative Recommendation

Large Language Models (LLMs) are increasingly adopted as backbones for Generative Recommendation (GR), promising access to pretrained world knowledge. Yet reliably invoking this knowledge for GR remains poorly understood. A key obstacle is that LLM-based GR typically represents items with Semantic IDs (SIDs), disrupting LLMs' natural-language reasoning interface because these tokens are unseen by the LLM during pretraining. Existing approaches address this with expensive multi-stage pipelines th...

HuggingFace

EgoPhys: Learning Generalizable Physics Models of Deformable Objects from Egocentric Video

Humans naturally understand object physics through everyday interactions, but faithfully predicting complex deformable dynamics, such as elastic materials and fabrics, remains a major challenge for computer vision and robotics. We present EgoPhys, a framework that constructs deformable physical digital twins from egocentric RGB-only video using generalizable priors. EgoPhys overcomes the limitations of existing methods to enable controllable deformable digital twin generation from egocentric vid...

HuggingFace

Industry News

OpenAI Losses Increased Nearly 8X in 2025, with Spending Hitting $34B

OpenAI's losses nearly 8x'd in 2025 with annual spending reaching $34 billion, reflecting the company's aggressive expansion and investment in AI development.

RSS

Claude: Elevated errors across many models

Anthropic's Claude model has been experiencing elevated error rates across multiple versions, affecting reliability for users relying on the AI assistant.

RSS

SpaceX to buy Cursor for $60B

SpaceX announced its acquisition of Cursor, a popular AI-powered code editor, for $60 billion as part of its expansion into software and AI development tools.

RSS

Amazon Announces Multibillion-Dollar Data Center in Missouri

Amazon announced a multibillion-dollar investment in a new data center facility in Missouri to support growing cloud computing and AI infrastructure demands.

RSS

Discussion

I Fired Google

An article detailing the author's decision to discontinue using Google services, likely exploring alternative platforms and tools.

RSS

Reviews have become expensive, rewrites have become cheap

As AI code reviews have become more expensive to conduct, rewrites and automated fixes have become comparatively cheaper, shifting development cost dynamics.

RSS