Cainew - Curated AI news for developers

TL;DR

Model Releases

Tools & Products

Research Papers

Tutorials

GLM-5.2 – How to Run Locally

Industry News

Meta pauses AI training program tracking employee keystrokes after internal leak

Discussion

Model Releases

VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO

VibeThinker is a 3-billion-parameter language model that outperforms Claude's Opus 4.5 on reasoning benchmarks using a novel SFT+GRPO training approach.

ArXiv

Mistral OCR 4

Mistral OCR 4 is the latest version of Mistral's optical character recognition model with improved accuracy and performance.

RSS

OpenAI DayBreak – GPT-5.5-Cyber

OpenAI DayBreak – GPT-5.5-Cyber is a new specialized version of OpenAI's GPT model focused on cybersecurity capabilities and applications.

OpenAI

Claude Tag

Claude Tag represents a new tagging or categorization feature for Claude, Anthropic's AI assistant.

Anthropic

Ultralytics YOLO26: Unified Real-Time End-to-End Vision Models

Ultralytics introduces YOLO26, a unified real-time vision model designed to perform end-to-end object detection and visual understanding tasks with improved efficiency and accuracy.

ArXiv

AlperKTS/Krea2_FP8

This entry references a quantized model variant (FP8 precision) of AlperKTS/Krea2, optimizing the model for efficient computation and deployment. The quantization approach reduces model size while maintaining performance.

HuggingFace

Tools & Products

Bluerails Discovery : The rails AI agents use to find and pay you

Most "AI visibility" tools stop at telling you if AI mentions your brand. Bluerails goes further. We make you discoverable to AI agents and ready to get paid by them, on the rails we already run for marketplaces. What stands out: • Discovery: a peer-reviewed AI-visibility score from 400 samples, not a one-off guess. Free, no signup. • Agent-ready checkout + global settlement • Compliance built in Try your free Discovery report today; agent payments roll out next.

ProductHunt

Cotypist: Local AI Autocomplete in your voice, anywhere on your Mac

Cotypist is smart autocomplete for the Mac apps you already write in: Mail, Slack, Notes, docs, even AI prompts. Press Tab when a suggestion fits, or keep typing and watch it update in real time. Runs locally on your Mac. No cloud, no API calls.

ProductHunt

Latitude: Fix what's breaking in your AI agent

Open-source AI agent monitoring platform. Latitude automatically detects all the ways your agents fail at scale, and gives your coding agent the tools to fix it.

ProductHunt

OpenArt Director: Direct cinematic videos through chat

OpenArt Director lets you create cinematic AI videos simply by chatting. Generate videos up to 5 minutes long with consistent characters, scenes, voice, music, and visual style throughout. Director develops story arcs, plans scenes, maintains continuity, and helps refine videos through natural conversation - acting more like a creative director than a traditional video generator. You're not generating clips anymore - you're directing stories.

ProductHunt

Hush: Open-source noise suppression for voice AI agents

Hush removes competing voices, background noise, and audio interference from real-time calls so your voice AI agents always hear what matters.

ProductHunt

Jotform AI App Builder: Turn ideas into powerful apps within seconds

Build complete apps by describing what you need. Jotform AI App Builder generates pages, forms, workflows, and data management automatically, then lets you refine everything with AI or manual edits. If you need something more advanced, AI can automatically generate custom widgets for dashboards, charts, calculators, and interactive tools, or let you create your own with AI Widget Creator. Combine forms, tables, AI agents, and custom widgets in a single app.

ProductHunt

Blazly SEO: Dominate SEO with an AI content operating system

Blazly SEO is the AI Content Operating System that helps marketers plan, write, optimize, humanize, and publish content from one platform. Discover keywords, build SEO strategies, generate blogs in bulk, automate workflows, connect Google Search Console, improve page speed, and publish directly to WordPress, Webflow, and more.

ProductHunt

Conduit: Fix the tool-list bloat slowing your AI agent

Your agent got slower the more MCP servers you added, and it's not the model. Every server dumps its whole tool list into context on every request: 3 servers cost ~24k tokens before you even say hi. Conduit puts them behind one local gateway that exposes 3 meta-tools the agent searches on demand. Measured: 97% less tool overhead per request, ~90% fewer tokens, same task success. Cloud or local, one tool or five. Keys in your OS keychain. Free and open source.

ProductHunt

Sipcode: Keep Claude Code's context clean for sharper answers

Context hygiene for Claude Code. Caps verbose tool output and dedupes same-session re-reads so the model sees signal, not noise. Anthropic measures 29% quality lift from cleaner context. Proof: 62.6% median tool-output savings on a locked 20-task benchmark. MIT.

ProductHunt

BestDefense.io: Pentest and patch every deploy with AI

AI attacks don’t wait for your next sprint. BestDefense continuously pentests every deploy, proves which vulnerabilities are actually exploitable, and generates fixes so high-compliance SaaS teams can patch real risks before remediation windows close. Unlike static scanners, BestDefense validates exploits through execution, cuts false positives, and helps developers move from finding issues to fixing them faster.

ProductHunt

jebi: A supercharged terminal for Mac with built-in local AI

jebi is a supercharged Mac terminal with built-in local AI — no API key, no subscription, no cloud. After every command, it suggests what to run next. Hit an error? jebi explains it in plain English and tells you how to fix it. Type /ask to chat with AI right in your terminal. All AI runs on-device with Qwen, Phi-3, and Gemma — your commands never leave your Mac. Beautiful UI, split panes, tabs, custom themes, grain texture, and slash commands like /ls and /ports.

ProductHunt

How Omio is building the future of conversational travel

Discover how Omio uses OpenAI to power conversational travel experiences, accelerate product development, and transform into an AI-native company.

OpenAI

Research Papers

Lift4D: Harmonizing Single-View 3D Estimation for 4D Reconstruction In-the-Wild

Lift4D is a method for harmonizing 3D estimation from single-view images to improve 4D reconstruction of dynamic scenes in unconstrained settings.

RSS

Show HN: Neural Particle Automata

Neural Particle Automata is a novel approach that combines neural networks with cellular automata principles to create emergent computational systems.

RSS

UniverSat: Resolution- and Modality-Agnostic Transformers for Earth Observation

Vision Transformers (ViT) dominate computer vision. However, their reliance on rigid patch projectors hinders transfer to Earth Observation (EO), where input modalities, scales, and resolutions vary widely. We introduce UniverSat, a ViT-style backbone built around a Universal Patch Encoder that maps patches from arbitrary spatial, spectral, and temporal resolutions, and from both optical and non-optical sensors, into a shared embedding space with a shared set of weights. This enables training a ...

HuggingFace

Safe Few-Step Generation via Velocity Editing

Flow matching has recently emerged as a strong paradigm for state-of-the-art text-to-image (T2I) generation, enabling high-quality generation with a small number of sampling steps. As these models are increasingly integrated into real-world applications, ensuring safe and non-sensitive content generation has become a critical requirement. However, adapting safety and concept removal methods to this new generation framework remains an open challenge. Specifically, prior methods largely rely on it...

HuggingFace

CLI-Universe: Towards Verifiable Task Synthesis Engine for Terminal Agents

While recent LLM-based terminal agents have demonstrated promising capabilities, the scarcity of high-quality, executable training data remains a critical bottleneck. Existing synthesis pipelines typically scale by retrofitting surface-level artifacts into tasks, frequently yielding ambiguous instructions, shallow execution paths, and brittle tests that provide weak learning signals. To overcome this, we introduce CLI-Universe, a principled synthesis engine that constructs terminal-agent tasks. ...

HuggingFace

When Agents Commit Too Soon: Diagnosing Premature Commitment in LLM Agents

Long-horizon LLM agents can fail quietly: they settle on one reading of the evidence early, then spend the rest of the run defending it. We call this premature commitment. Final-answer scoring misses the failure mode because it sees only the answer, not whether the process has already collapsed to a stable path. We define representational commitment as cross-run hidden-state convergence at a fixed reasoning step, and use it as an early diagnostic of trajectory consistency. On Llama-3.1-70B runni...

HuggingFace

Dense Reward for Multi-View 3D Reasoning with Global Maps and Local Views

Multi-view 3D Visual Question Answering (MV3D-VQA) requires integrating partial observations into a coherent 3D scene representation and selecting informative viewpoints for multi-step spatial reasoning. However, current multimodal LLMs are typically trained with sparse, answer-level supervision, which often yields inconsistent cross-view reasoning and brittle view selection. We present DR-MV3D (Dense Reward for MV3D-VQA), a map-grounded learning framework that provides dense, verifiable rewards...

HuggingFace

Training Open Models for Agentic Phone Use

Phones are becoming an important execution surface for general-purpose agents, but training open models for reliable phone use remains difficult because the environment that matters at deployment, real devices running real apps, is slow, stateful, side-effectful, and hard to reset or verify, while scalable mock environments only approximate real behavior. We present PhoneBuddy, a training recipe and open-model line for agentic phone use that combines a real-app environment with a mock-app enviro...

HuggingFace

Causal Discovery in the Era of Agents

Recent attempts to combine large language models (LLMs) with causal discovery ask models to infer pairwise directions, propose graph structures, or inject language-model outputs as priors and constraints. These approaches promise faster analysis, but they also obscure whether a causal evidence is supported by data and assumptions or by textual associations, prompt artifacts and hallucinated mechanisms. We argue for a different role for agents in causal discovery. Agents should inspect data, retr...

HuggingFace

ShotcreteDepth: A Bi-modal Dataset for Robust Robotic Depth Perception in Shotcrete Construction Environments

We introduce ShotcreteDepth, a bi-modal dataset from the construction domain that captures both an active shotcreting process and general construction environments. The dataset comprises stereo RGB imagery and LiDAR point clouds acquired under harsh real-world conditions, including high turbidity and poor illumination. Such conditions adversely affect sensor measurements, leading to incomplete and noisy observations that pose significant challenges for perception systems in autonomous applicatio...

HuggingFace

KaLM-Reranker-V1: Fast but Not Late Interaction for Compressed Document Reranking

As retrieval systems scale, high-quality reranking becomes increasingly important. However, most existing rerankers, whether encoder-based or decoder-based, jointly encode the query and passage, tightly coupling their computation and limiting deployment efficiency as well as flexibility. We present KaLM-Reranker-V1, a fast but not late-interaction (FBNL) reranker that decouples query and passage computation while retaining expressive relevance modeling. Built on an encoder-decoder architecture, ...

HuggingFace

Self-Compacting Language Model Agents

Long agent traces composed of chains of thought and tool calls accumulate stale content that anchor subsequent generations, and eventually outgrow the context window. Existing scaffolds mitigate it with fixed-interval compaction triggered at a token threshold. Such triggers pay no heed to trajectory structure, risking discard of partial results mid-derivation or mid-search. We propose SelfCompact, a scaffold that allows the model itself to decide when and how to compact. Specifically, it pairs t...

HuggingFace

Tmax: A simple recipe for terminal agents

Terminal-using agents have quickly become the most popular downstream application of language models (LMs). Despite their prevalence, relatively little academic work has examined RL-based training of these models, likely due to difficult benchmarks, a lack of data, and a lack of simple baseline recipes. We present Tmax, the strongest open RL recipe for terminal agents to date, bringing open data recipes closer to the frontier. While simple, our recipe achieves 27\% on Terminal-Bench 2.0 with onl...

HuggingFace

Capable but Careless: Do Computer-Use Agents Follow Contextual Integrity?

Computer-use agents (CUAs) now act on a user's behalf across personal applications such as email, calendars, and to-do lists. This cross-application access is useful, but it also creates a privacy risk that has been largely overlooked: when an agent works in one context, it can pull in information from another that is inappropriate in that context. Hence, we introduce AgentCIBench, an evaluation harness that turns this risk into executable, deterministically scored scenarios. We target three com...

HuggingFace

HAKARI-Bench: A Lightweight Benchmark for Comparing Retrieval Architectures and Efficiency Settings under Unified Conditions

With the rapid spread of retrieval-augmented generation and semantic search, choosing the right embedding and retrieval configuration is increasingly hard. Large retrieval benchmarks are comprehensive but too heavy to rerun during development, and there is little infrastructure for comparing production settings--dimensionality reduction, quantization, reranking--across many models under identical conditions. We present HAKARI-Bench, a lightweight benchmark that reconstructs existing retrieval su...

HuggingFace

Tutorials

GLM-5.2 – How to Run Locally

GLM-5.2 is a large language model that can be run locally, offering users an option to deploy the model on their own infrastructure.

RSS

Industry News

Meta pauses AI training program tracking employee keystrokes after internal leak

Meta has paused its AI training program that tracked employee keystrokes after an internal data leak exposed the privacy-invasive practice.

RSS

Discussion

Elevated error rate across multiple models

An elevated error rate has been reported across multiple AI models, indicating potential issues with model performance or infrastructure.

RSS

AI's Affordability Crisis

This article explores the growing challenge of AI accessibility, discussing how high costs associated with advanced AI systems are creating barriers for widespread adoption and use. ### [Ask HN: Anthropic banned me from using Claude Code and I don't know what to do]() A user discusses being banned from Anthropic's Claude Code feature and seeks advice on understanding the reasons behind the restriction and potential solutions.

RSS

How GPT-5 helped immunologist Derya Unutmaz solve a 3-year-old mystery

GPT-5 Pro helped solve a 3-year-old immunology mystery, offering insights into T cell behavior. The breakthrough could support cancer and autoimmune research.

OpenAI