Cainew - Curated AI news for developers

TL;DR

Model Releases

Tools & Products

Research Papers

Tutorials

How to automate Instagram engagements with computer vision (and get banned)

Industry News

Discussion

Model Releases

Claude Fable is relentlessly proactive

Claude Fable exhibits proactive behavior in its interactions, taking initiative beyond simply responding to user prompts. This characteristic sets it apart in terms of engagement and helpfulness.

RSS

Kimi K2.7-Code: open-source coding model with better token efficiency

Kimi K2.7-Code is an open-source coding model that demonstrates improved token efficiency compared to existing alternatives. The model aims to make code generation more accessible and cost-effective.

HuggingFace

Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF

Jackrong/Qwopus3.6-27B-Coder-MTP-GGUF is a quantized large language model variant optimized for coding tasks, available in GGUF format for efficient deployment.

HuggingFace

Tools & Products

DietrichGebert/ponytail

Makes your AI agent think like the laziest senior dev in the room. The best code is the code you never wrote.

GitHub

huawei-csl/KVarN

KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.

GitHub

johnbean393/KeyType

An open-source Cotypist with macOS system wide AI autocomplete

GitHub

aqua5230/usage

macOS menu bar app pinning Claude Code & Codex quota, tokens, and cost to your screen. Local-only, zero API calls. HTML reports, 9 themes.

GitHub

NeuralInverse/neuralinverse

Code Modern. Code Legacy. Code Firmware. - open-source AI-native IDE with agentic coding, Power Mode, legacy modernization, and firmware development

GitHub

vuejs-ai/vue-tui

The Vue framework for terminal UIs. SFC & JSX, Yoga flexbox, HMR, and testing out of the box.

GitHub

jqtangust/Robust-U1

🚀🚀🚀 [ICML 2026] Official Implementation of Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?

GitHub

hanxiao/knowledge-graph-extractor

Turn any document or a whole zip into an interactive knowledge graph, using a self-hosted Qwen3.6-35B-A3B-MTP on a single NVIDIA L4

GitHub

dongxutang918-afk/agentwatch

Apple Watch notifications for Claude Code and AI agent workflows.

GitHub

john-rocky/coreai-model-zoo

Community model zoo + knowledge base for Apple Core AI (iOS/macOS 27): Qwen3.5 & Gemma 4 converted end-to-end, verified on-device (iPhone 17 Pro GPU/ANE), conversion gotchas, custom Metal kernels, Swift runner

GitHub

Qursor: Point at any UI to send exact context to your AI

I kept wasting AI tokens describing UI changes to agents that edited the wrong element. So I built Qursor. Point at any element, copy structured context (selectors, classes, styles, fonts, colors), paste into your AI agent. No vague screenshots. No burned credits. - Inspect fonts, colors, spacing - Copy AI-ready element context - Extract components as HTML/CSS/JSX - Color picker and font detector - Download assets from any page

ProductHunt

Pond: Fundraising, GTM, and bounties for startups

Pond is the market infrastructure for the new startup economy. Verified founders raise capital, acquire customers, and access 20,000+ contributors in one place. · Markets: Stripe-verified metrics + Vault-protected funding. One startup raised $150K in 3 minutes. · Bounties: 10,000+ submissions, $36K+ distributed. Ethereum Foundation, GPTZero, PhotoBase. · AI Growth Agent: CRM, pipeline, and GTM on autopilot. 400+ startups · 154 countries · Archetype & Coinbase Ventures.

ProductHunt

Meet Warren 3.0: Your voice-supported AI financial planning partner

One voice conversation. Free financial plan. We built Warren because financial planning was broken for anyone without a six-figure portfolio. IFAs charge £200/hr. Spreadsheets go stale. Generic apps tell you what you already know. Warren shows you two futures: what happens if you do nothing and what changes if you act. Then Warren gives you a set of next steps, tracks your progress, and monitors your plan against economic changes. Join 3K+ Brits already using Warren. 10 mins to start.

ProductHunt

Bob's CLI: A local-first AI coding CLI that adapts to you

Bob's CLI runs on your own hardware with zero API costs, zero data leaving your machine. Bob lives in your terminal, sees your actual files, and writes code only with your explicit approval. What makes it different: auto-detect local AI models, behavioral DNA profiling that adapts to how YOU work, autonomous code review + auto-fix, conversation forking, deep dives, and SovereignLink — remote execution from any device while your code stays home. Free to start. Sovereign by design.

ProductHunt

Slack Data Agent: Ask about your data without leaving Slack

Basedash for Slack is your AI data analyst inside Slack — now in the official Slack Marketplace. Mention @Basedash in any channel and it queries your real data sources, thinks in the thread, and replies with an answer and a chart, right where your team is talking. Automations deliver scheduled reports to your channels, and insights surface anomalies automatically — charts included. Ask in Slack. Answered by your data.

ProductHunt

Research Papers

HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers

Holistic visual tokenizers are fundamental to unified multimodal models (UMMs) as they map diverse visual inputs into a unified representation space. In this paper, we present HYDRA-X, the first UMM that unifies image and video tokenization within a single Vision Transformer (ViT). Our design is driven by two core challenges: efficiently injecting spatiotemporal reconstruction capability into a native ViT, and embedding image- and video-level semantic awareness into the latent space. To address ...

HuggingFace

Surflo: Consistent 3D Surface Flow Model with Global State

Geometry is invariant to viewpoint, which makes any collection of images a redundant encoding of a single 3D state. Existing feed-forward reconstruction models fail to exploit this: per-view methods emit overlapping, unaligned pointmaps that grow linearly with input count, while global-latent methods commit to a fixed, low-resolution output. We introduce Surflo, which compresses a variable number of unposed RGB views into K latent tokens-one global state-and decodes oriented 3D surface points by...

HuggingFace

MiniMax Sparse Attention

Ultra-long-context capability is becoming indispensable for frontier LLMs: agentic workflows, repository-scale code reasoning, and persistent memory all require the model to jointly attend over hundreds of thousands to millions of tokens, yet the quadratic cost of softmax attention makes this untenable at deployment scale. We introduce MiniMax Sparse Attention (MSA), a blockwise sparse attention built upon Grouped Query Attention (GQA). A lightweight Index Branch scores key-value blocks and inde...

HuggingFace

LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories

Scientific laboratories increasingly rely on AI systems to reason about experiments, but the physical act of doing science remains largely outside their reach. AI can help read literature, generate hypotheses, and plan protocols, yet the execution of those protocols at the bench still requires a human operator. Vision-Language-Action (VLA) models provide one possible interface between written protocols and robot execution, but existing policies are trained mostly on household and tabletop demons...

HuggingFace

WEAVER, Better, Faster, Longer: An Effective World Model for Robotic Manipulation

The potential impacts of world models (WMs, i.e., learned simulators) on robotics are far-reaching -- policy evaluation, policy improvement, and test-time planning -- all with limited real-world interaction. To unlock these downstream capabilities, a WM needs to jointly satisfy three desiderata: (i) fidelity (i.e., producing simulated trajectories that correlate with reality), (ii) consistency (i.e., producing simulated trajectories that are coherent over long horizons), and (iii) efficiency (i....

HuggingFace

ArogyaSutra: A Multi-Agent Framework for Multimodal Medical Reasoning in Indic Languages

Multimodal Large Language Models (MLLMs) have shown promising reasoning capabilities in general domains, yet their performance remains limited in specialized settings such as healthcare, especially in multilingual and low-resource scenarios. This gap is critical in regions like rural India, where patients often express complex medical queries in native Indic languages and rely on multimodal inputs such as medical images. Existing English-centric MLLMs struggle to support such use cases, limiting...

HuggingFace

Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning

Latent chain-of-thought compresses reasoning by replacing visible reasoning traces with continuous hidden-state recurrence, but existing formulations are difficult to optimize with standard on-policy reinforcement learning (RL) and hard to interpret causally. Our key insight is that a single pair of explicit boundary tokens can address both issues at once: discrete entry and exit anchors make the latent block compatible with standard on-policy RL, and the same anchors offer a natural foothold fo...

HuggingFace

EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluations assume static environments. In contrast, real-world deployment is inherently dynamic, requiring agents to continually align their knowledge, skills, and behavior with changing environments and updated task conditions. To address this gap, we introduce EvoArena, a benchmark suite that models environment changes as sequences of progressive updates across terminal, software, and so...

HuggingFace

EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

LLM-based agents have shown increasing potential in automating scientific discovery. Given an optimizable metric and an execution environment, they can propose, validate, and iterate scientific solutions, and have produced results that outperform human-designed approaches. As model capabilities continue to improve, we argue that the bottleneck for autonomous scientific discovery is shifting from prescribing agent workflows to designing agent environments: the resources, constraints, and interfac...

HuggingFace

Getting Better at Working With You: Compiling User Corrections into Runtime Enforcement for Coding Agents

Interactive LLM agents are becoming part of daily work, but they do not reliably become easier to work with over time: a correction remembered in one session may still be violated in the next. We study this gap between preference access and preference compliance. In tasks derived from anonymized real-user friction cases, Mem0 memory still leaves 57.5% of applicable preference checks violated. We introduce Test-time Rule Acquisition and Compiled Enforcement (TRACE), a drop-in skill-layer pipeline...

HuggingFace

MoVerse: Real-Time Video World Modeling with Panoramic Gaussian Scaffold

We present MoVerse, a real-time video world model that creates an interactively navigable scene from a single narrow-field-of-view image. This setting is challenging because the input observes only a small fraction of the environment, while interactive roaming requires a complete surrounding world, persistent geometry, controllable camera motion, and temporally coherent high-fidelity observations. MoVerse addresses this problem by separating world construction from observation rendering. It firs...

HuggingFace

HarnessBridge: Learnable Bidirectional Controller for LLM Agent Harness

Large language models are increasingly deployed as agents for long-horizon tasks, yet their performance is shaped not only by model capability and environment design, but also by the harness that mediates agent--environment interaction. Existing harnesses are largely manually engineered, making them difficult to scale as trajectories grow longer and interactions become more complex. In this work, we ask whether harness can be generated by a learnable plug-in module that can be trained in an end-...

HuggingFace

EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

Search Agents -- large language models augmented with search tools -- have intensified the need for future-proof evaluation benchmarks. Existing benchmarks such as BrowseComp rely on static knowledge, making them vulnerable to test-set contamination and parametric memorization. Consequently, models can achieve high scores through fact recall rather than genuine retrieval, obscuring true browsing competence via reasoning shortcuts. In this paper, we introduce EvoBrowseComp, an evolving benchmar...

HuggingFace

RepWAM: World Action Modeling with Representation Visual-Action Tokenizers

This work presents RepWAM, a representation-centric world action model (WAM) built on representation visual-action tokenizers. Existing WAMs typically inherit reconstruction-oriented video tokenizers from pretrained video generation models. Although these tokenizers preserve visual fidelity, pixel reconstruction alone provides limited guidance for learning instruction-following dynamics that connect future prediction with robot control. To address this, we explore a semantic visual-action latent...

HuggingFace

InterleaveThinker: Reinforcing Agentic Interleaved Generation

Recent image generators have demonstrated impressive photorealism and instruction-following capabilities in single-image generation and editing. However, constrained by their architectures, they cannot achieve interleaved generation (text-image sequence), which has crucial applications in visual narratives, guidance, and embodied manipulation. Even the latest open-source Unified Multimodal Models (UMMs) exhibit limited performance in this regard. In this paper, we introduce InterleaveThinker, th...

HuggingFace

Tutorials

How to automate Instagram engagements with computer vision (and get banned)

The article explores using computer vision to automate Instagram engagement but warns that such automation violates the platform's terms of service and results in account bans. It serves as a cautionary tale about the consequences of automation on social media.

RSS

Industry News

Why I'm Forced to Say Farewell: Google Management Has Lost Its Moral Compass

A former Google employee announces their departure, criticizing Google management for losing its ethical direction and moral principles. The farewell post reflects broader concerns about corporate values in the tech industry.

RSS

Tesla Full Self Driving uses bicycle lane in official Denmark approval video

Tesla's Full Self-Driving feature was shown using bicycle lanes in Denmark's official approval demonstration video. This raises concerns about the system's lane recognition and safety in real-world conditions.

RSS

TCS and Anthropic partner to bring Claude to regulated industries

Anthropic

Discussion

AI agent bankrupted their operator while trying to scan DN42

An AI agent attempting to scan the DN42 network caused significant financial losses to its operator through unexpectedly high resource consumption. The incident highlights the risks of deploying autonomous agents without proper safeguards.

RSS

Tailwind and slop apps

This piece discusses the proliferation of AI-generated or low-effort 'slop' applications built with Tailwind CSS. It examines the impact of easy app-building tools on software quality and market saturation.

RSS

Shall we play a game? My AI nuclear simulation

The author created an AI-powered nuclear simulation game to explore strategic decision-making and conflict scenarios. This interactive experience demonstrates how AI can be used for educational simulation and game design.

RSS