Cainew - Curated AI news for developers

TL;DR

Tools & Products

Research Papers

Industry News

Discussion

How an astrophysicist uses Codex to help simulate black holes

Tools & Products

🧭 Architecture-first system design: 26 bilingual tutorials, 25 architecture templates, and 6 end-to-end cases covering distributed systems, AI-native systems, RAG, coding Agents, and production trade-offs.

GitHub

huawei-csl/KVarN

KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.

GitHub

duncatzat/vigils

A local control plane for AI agents — see what they do, approve what matters, keep secrets out. Rust + Tauri + Chrome MV3.

GitHub

OnlyTerp/UltraCode-Shim

Give Claude Code's ultracode mode to ANY model you already pay for. A tiny local proxy + one config.json. Point your AI at AGENTS.md and it sets itself up.

GitHub

Episkey-G/GrokSearch-rs

Rust MCP server for Grok web search and Tavily-backed source retrieval

GitHub

Aimino-Tech/opendocswork-mcp

Rust-native MCP server for Office document processing (Excel, Word, PowerPoint). Sub-millisecond, local-first, open source.

GitHub

MiMo Code is now released and open-source

MiMo Code, a software development tool, has been officially released and made open-source for public use and contribution. This move aims to democratize access to advanced coding technology in the developer community.

RSS

CyberSealNull/CcCompanion

Unofficial MIT-licensed iOS companion for Claude Code: self-hosted relay, local-first chat, search, and session control from your iPhone. Not affiliated with Anthropic.

GitHub

mirror29/inalpha

🦊 Open-source professional quant agent framework. Agents pick the factors working now to time entries, write full strategies, and evolve them in a sandbox — every order through machine approval, the LLM never on the order path. Multi-market, audit-grade.

GitHub

Respan Gateway: One AI gateway with built-in observability and evals

Respan AI Gateway connects your app to 1,000+ AI models through one endpoint. But routing is the easy part. Respan keeps production AI reliable and under control with fallbacks, retries, caching, spend limits, alerts, and full traces for every call. Gateway, observability, evals, prompt management, monitors, and cost controls all run on one platform, so you do not need to stitch together five tools to debug production.

ProductHunt

akii-technologies-ltd/akii-seo-ai-search-optimizer

Free Claude Code plugin for SEO, AEO, and GEO. Audit sites, optimize content, generate schema, and track AI visibility across ChatGPT, Claude, Gemini, Perplexity, Copilot, and Google AI Overviews.

GitHub

Asmi AI: AI that handles your personal chores in the real world

Asmi calls you every morning. You talk - it handles the day. It calls services (dentist, salon, plumber, bank, insurance) or people (friends, colleagues) to coordinate, book or resolve things. Updates you on iMessage or WhatsApp when done. It can navigate IVRS, wait on hold and handle complex conversations well.

ProductHunt

Open Reproduction of DeepSeek-R1

Researchers have successfully created an open reproduction of DeepSeek-R1, making the advanced AI model more accessible to the scientific community. This achievement enables greater transparency and independent verification of the model's capabilities.

GitHub

Slashspace AI: Canvas first AI experience for sustained, complex work

An AI native user today copy-pastes prompts across a dozen apps. It's a broken experience for any kind of meaningful work. Every new chat box collapses context to zero. Slashspace solves that with an AI canvas where AI lives on the canvas, and you can run many chats as nodes. The canvas becomes the context space, and all the agents can see each other. Canvas is stored as files on your computer. Built with 1600 power users for over 1.5 years, we're the most mature canvas AI on the market.

ProductHunt

Nodey: Your n8n command center, now on your phone

Nodey is a mobile companion for n8n. Monitor your workflows in real time, diagnose failed executions with AI, build workflows from a prompt, and trigger automations with NFC tags or geofenced locations — all from your phone.

ProductHunt

Research Papers

Which Models Are Our Models Built On? Auditing Invisible Dependencies in Modern LLMs

Modern LLM training pipelines increasingly rely on other models to generate data, filter corpora, judge outputs, and guide development decisions. These dependencies are recursive: a model may depend on an upstream artifact whose own dependencies are documented only in separate releases and artifacts. As a result, the full dependency structure is fragmented across heterogeneous public artifacts, with complexity and recursive depth far outpacing humans' ability to trace. We introduce ModSleuth, an...

HuggingFace

World Model Self-Distillation: Training World Models to Solve General Tasks

Pretrained video generators are promising visual world models that exhibit emergent task-solving abilities; however, their reliance on detailed textual descriptions limits their direct use for planning and decision-making. Existing approaches either outsource this reasoning to language or vision-language models, or rely on supervised fine-tuning with paired task-execution videos, which are costly to collect and difficult to scale. We propose a scalable framework that elicits task-solving ability...

HuggingFace

Time-Series Foundation Model Embeddings for Remaining Useful Life Estimation

Remaining Useful Life (RUL) prediction is essential for industrial predictive maintenance, yet many learning-based approaches rely on extensive feature engineering or large labeled datasets to train task-specific sequence models. In this work, we introduce a lightweight learning approach, in which we leverage a frozen pretrained time-series foundation model (TSFM) and combine it with a small regression head for RUL estimation from multivariate sensor streams. More specifically, we use Chronos-2 ...

HuggingFace

InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning

Recent progress in foundation models has shifted toward agentic behavior involving multi-step reasoning and tool use. However, open-source efforts largely focus on text-dominant settings, leaving long-horizon multimodal tasks underexplored. This gap is evident in video tasks requiring sustained temporal understanding and iterative interaction. We present InternVideo3, a framework enhancing these capabilities via Multimodal Contextual Reasoning (MCR). MCR treats understanding as a closed-loop pro...

HuggingFace

Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application

Environments serve as interactive systems for large language model (LLM) based agents across diverse scenarios and play a crucial role in driving the continual evolution of model capabilities. Despite this importance, existing work lacks a systematic categorization and deep analysis. This paper systematically studies current researches on agentic environments from the perspective of the environment engineering lifecycle, covering their modeling, synthesis, evaluation and application. Specificall...

HuggingFace

ICA Lens: Interpreting Language Models Without Training Another Dictionary

Finding interpretable directions in language-model representations is critical for understanding and controlling model behavior. Sparse autoencoders (SAEs) have become the standard tool for this purpose, but using them as the default first lens often requires training, storing, and evaluating large overcomplete dictionaries. This bottleneck limits rapid exploration and raises a fundamental question: how much interpretable structure is already visible from activation geometry before training anot...

HuggingFace

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

Router is the cornerstone component to the Mixture-of-Experts models. Serving as expert proxies, the rows of the router matrix compute their similarity to the MoE inputs to determine which subset of experts is activated. Ideally, each router row is designed to encode the expert matrix into this representative vector, such that its dot-product with token can better reflect token-expert affinity. However, there exists no design principles to enforce this condensation. In this paper, we propose to ...

HuggingFace

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

There are two main Parameter-Efficient Fine-Tuning (PEFT) techniques for Large Language Models (LLMs). While Low-Rank Adaptation (LoRA) introduces additional weights between the LLM layers, Soft Prompting introduces additional fine-tuning-specific raw tokens to an LLM input. However, both require modification to the computational graphs of precompiled, preoptimized LLMs. As a result, neither is fully supported in high-throughput engines like vLLM. We propose fine-tuning with ART (Art-based Reinf...

HuggingFace

Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning

Spatial reasoning from egocentric videos is inherently challenging because the observable evidence is constrained by the camera trajectory. Existing methods rely on single-turn inference, forcing models to resolve geometric ambiguity through semantic priors rather than verifiable evidence. We argue that spatial reasoning should be revisitable: conclusions formed under limited evidence should remain open to revision when complementary viewpoints become available. Building on this insight, we prop...

HuggingFace

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

Scientific progress depends on a repeated loop of exploration, experimentation, and abstraction. Researchers test candidate directions, interpret the evidence, and carry the resulting lessons into later attempts. We study how an AI agent can run this loop autonomously over long horizons. We introduce Arbor, a general framework for autonomous research that combines a long-lived coordinator, short-lived executors, and Hypothesis Tree Refinement (HTR), a persistent tree that links hypotheses, artif...

HuggingFace

Lius: Translation Model Based Instructional Lingustic Using Continual Instruction Tuning In Kupang Malay

Large Language Models (LLMs) offer new potential for translation tasks but often experience performance degradation when handling low-resource languages. To address this limitation, we propose an approach for fine-tuning LLMs on a low-resource language, Kupang Malay. Our approach involves designing a set of instructions by leveraging explicit lexical and semantic features from a bilingual dictionary, and introducing Continual Instruction Tuning (CIT), a training paradigm that enables iterative i...

HuggingFace

Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models

Vision-language models (VLMs) project images into hundreds to thousands of visual tokens, making decoder inference expensive in both attention computation and KV-cache memory. Existing visual-token reduction methods largely follow a rank-and-remove paradigm: they score visual tokens, keep a compact subset, and permanently discard the rest. We show that this irreversible action is fragile because visual-token importance changes across decoder depth; tokens ranked low at one stage may become relev...

HuggingFace

Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code

Large Language Models (LLMs) are increasingly used for code generation, raising concerns that they may be misused to produce malicious code. Meanwhile, Grammar-Constrained Decoding (GCD) has been widely adopted to improve the reliability of LLM-generated code by enforcing syntactic validity. In this paper, we reveal a counterintuitive risk: this reliability-oriented technique can itself become an attack surface. We uncover a new jailbreak attack, termed CodeSpear, that exploits GCD to induce LLM...

HuggingFace

On Subquadratic Architectures: From Applications to Principles

Transformers dominate modern sequence modeling, but their quadratic attention incurs substantial computational cost. Subquadratic architectures offer a scalable alternative. However, it remains unclear which designs yield the most effective sequence models. We compare three leading approaches: xLSTM, Mamba-2, and Gated DeltaNet. We evaluate these models on tasks with complex dependencies: (1) code-model pre-training, (2) distillation of code models from large language models, and (3) pre-trainin...

HuggingFace

On the Limits of LLM-as-Judge for Scientific Novelty Assessment

LLMs are increasingly used to generate and judge scientific ideas. This makes novelty evaluation a central problem. Full idea evaluation is difficult because it often requires judging a method, its feasibility, and its empirical promise. We therefore study a cleaner upstream object: the research question (RQ). RQ generation is a prerequisite for scientific ideation, and RQs can be compared against questions pursued in real papers. We introduce RQ-Bench, a benchmark built from recent arXiv papers...

HuggingFace

Industry News

OpenAI mulls slashing prices as it competes with Anthropic for users

OpenAI is considering significant price reductions for its AI services as it intensifies competition with Anthropic to attract and retain users. The pricing strategy reflects the growing competitive pressure in the large language model market.

RSS

BBVA puts AI at the core of banking with OpenAI

Learn how BBVA scaled ChatGPT Enterprise to 100,000 employees and partnered with OpenAI to accelerate AI-powered banking transformation worldwide.

OpenAI

Pokémon Go Scans Trained the Navigation Tech for Military Drones

Pokémon Go's scanning feature provided crucial training data for the navigation and computer vision technology now powering military drones. The gaming app's massive user base unknowingly contributed to the development of defense technology.

RSS

AI agent runs amok in Fedora and elsewhere

An AI agent malfunctioned and caused problems across Fedora systems and other environments, highlighting risks in deploying autonomous AI systems. The incident demonstrates the need for better safety controls and monitoring of AI agents.

RSS

Workers are spending over 6 hours a week botsitting AI, fueling job frustration

A new study reveals that workers spend over 6 hours per week supervising and correcting AI outputs, a task dubbed 'botsitting.' This hidden labor burden is contributing to workplace frustration and reducing overall productivity gains.

RSS

Anthropic apologizes for invisible Claude Fable guardrails

Anthropic has issued an apology for implementing hidden guardrails in Claude Fable that users were unaware of. The company acknowledged the lack of transparency in its AI safety measures.

RSS

OpenAI to acquire Ona

OpenAI plans to acquire Ona to expand Codex with secure, persistent cloud environments, enabling long-running AI agents across enterprise workflows.

OpenAI

Discussion

How an astrophysicist uses Codex to help simulate black holes

Discover how astrophysicist Chi-kwan Chan uses Codex to build black hole simulations, helping scientists study extreme physics and test Einstein’s theory of general relativity.

OpenAI