Cainew - Curated AI news for developers

TL;DR

Model Releases

Magenta RealTime 2: Open and Local Live Music Models

Tools & Products

Research Papers

Tutorials

Fine-tuning an LLM to write docs like it's 1995

Industry News

Model Releases

Magenta RealTime 2: Open and Local Live Music Models

Google's Magenta team releases RealTime 2, a collection of open and locally-runnable models for live music generation and manipulation. These models enable real-time creative applications without requiring cloud infrastructure.

RSS

Tools & Products

2aronS/Duel-Agents

CLI, SDK, and IDE plugins for Duel Agents

GitHub

DaoyuanLi2816/can-i-finetune-this

Estimate whether a Hugging Face model fits and fine-tunes on your local GPU.

GitHub

Anthropic's open-source framework for AI-powered vulnerability discovery

Anthropic releases an open-source framework designed to leverage AI capabilities for discovering and identifying software vulnerabilities. This tool aims to improve security by automating the vulnerability detection process.

GitHub

Shiyao-Huang/awesome-agent-evolution

Open survey and evidence map for AI agent evolution, self-evolving agents, memory, skills, harnesses, benchmarks, and agent-swarm systems.

GitHub

Open Code Review – An AI-powered code review CLI tool

Open Code Review is an AI-powered CLI tool that automates code review processes and provides intelligent feedback on code quality. The tool helps developers identify issues and improve their code before submission.

GitHub

Leni: The world’s most accurate AI for investors

Leni is the most accurate and verifiable AI for serious investment work. Built on 21,000+ decision traces and processing 100M+ rows daily, it delivers finance-grade outputs with full auditability through source links, timestamps, and grounded comps. Leni outperforms GPT, Claude, and Manus on independent benchmarks for accuracy, modeling, and valuation while giving teams the trust they need when millions are on the line. Leni is part of Google Startups and a serious machine for investors.

ProductHunt

vuejs-ai/vue-tui

The Vue framework for terminal UIs. SFC & JSX, Yoga flexbox, HMR, and testing out of the box.

GitHub

Infini-AI-Lab/astraflow

Dataflow-Oriented Reinforcement Learning for (Multi-)Agentic LLMs

GitHub

tomfunk/fungible

Terminal UI for personal finance — Plaid sync, CSV import, AI assistant, and MCP server

GitHub

Minimi: Your ambient memory for Claude

Every great Claude response starts with context. Minimi listens across your Mac - docs, calls, messages, tabs - and gives Claude the full picture. No prompting. All on-device and private.

ProductHunt

Veltrix AI: AI finance copilot for cash flow, margins, and growth

Veltrix AI gives founders and finance teams instant clarity on cash flow, profitability, burn, and business performance. Connect QuickBooks, Xero, Shopify, Square, and HubSpot, then ask finance questions in plain English to get source-backed answers, anomalies, and recommended next steps. Replace spreadsheet chaos and static dashboards with real-time financial intelligence built to help you make faster, smarter business decisions.

ProductHunt

wahahaazhe/KnowGT

GitHub Trending Daily Briefing Skills for Claude Code

GitHub

Nemotron 3 Ultra by NVIDIA: Powers faster, efficient reasoning for long-running agents

A 550B MoE frontier-intelligence open model built for long-running agents. It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models.Ultra excels at complex tasks like coding and deep research. Long-running agents spend their time planning, using tools, recovering from failures, and deciding what to do next.

ProductHunt

Agent Mode on Arena: Get real-world tasks done with autonomous AI agents

Most AI benchmarks test models in controlled environments. Agent Mode tests them on complex tasks to get more work done. Run autonomous agents that browse, research, code, use files, and complete multi-step workflows from a single prompt. Then watch each workflow unfold step by step. Every run contributes to the Agent Arena Leaderboard, ranking frontier models by real-world agentic performance.

ProductHunt

WEIFENG2333/phistory

Phistory automatically archives versioned system prompt snapshots from agent CLIs like Claude Code, Codex, OpenClaw, and Hermes.

GitHub

Research Papers

Do transformers need three projections? Systematic study of QKV variants

Researchers conduct a systematic study examining whether transformer models require all three projection matrices (Query, Key, Value) or if some can be eliminated. The findings could optimize transformer architecture efficiency.

ArXiv

Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution

Code language models need repository-level context to resolve imports, APIs, and project conventions. Existing methods inject this knowledge as long inputs (retrieved through RAG or dependency analysis) or through per-repository fine-tuning and LoRA -- costly at repository scale and brittle to evolving codebases. We introduce Code2LoRA, a hypernetwork framework that generates repository-specific LoRA adapters, effectively injecting repository knowledge with zero inference-time token overhead. Co...

HuggingFace

Towards One-to-Many Temporal Grounding

Temporal Grounding (TG) aims to localize video segments corresponding to a textual query. Prior research predominantly focuses on single-segment retrieval. Real-world scenarios, however, often require localizing multiple disjoint segments for a single query -- a setting we term One-to-Many Temporal Grounding (OMTG). Previous state-of-the-art MLLMs, optimized for one-to-one settings, struggle in this context, often yielding near-zero scores due to a lack of event cardinality perception. To bridge...

HuggingFace

AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints

Planning for real-world problems by language models often involves both world and user constraints, which may not be fully specified upfront and are progressively disclosed through interaction. However, existing benchmarks still underexplore adaptive planning under such progressively revealed dual constraints. To address this gap, we introduce AdaPlanBench, a dynamic interactive benchmark for evaluating whether Large Language Model (LLM) agents can adaptively plan and re-plan under progressively...

HuggingFace

MLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm Discovery

Large language model (LLM) agents are increasingly applied to long-horizon tasks such as scientific discovery and machine learning engineering (MLE), where sustained self-evolution becomes a key capability. However, existing MLE agents suffer from inter-branch information isolation, memoryless search, and lack of hierarchical control, which together hinder long-horizon optimization. We present MLEvolve, an LLM-based self-evolving multi-agent framework for end-to-end machine learning algorithm di...

HuggingFace

Discrete-WAM: Unified Discrete Vision-Action Token Editing for World-Policy Learning

Autonomous driving requires reasoning about how ego actions shape the evolution of the surrounding world. However, most end-to-end methods rely on direct state-to-action mappings, capturing correlations without explicitly modeling action-conditioned dynamics. Conversely, continuous-latent world models often lack compositional structure for causal reasoning across counterfactual futures. We introduce Discrete-WAM, a unified latent vision-action world policy that represents future visual states an...

HuggingFace

AURA: Intent-Directed Probing for Implicit-Need Surfacing in Situated LLM Agents

A situated query like "where is Lin Wei?" often encodes more than its literal content: the user may also want to know whether Lin Wei is free, in a good mood, or worth interrupting now. Standard tool-use agents answer the literal question and stop. AURA inserts an inference step between scene perception and tool use that produces an IntentFrame: a structured estimate of the implicit need with a scalar gap score that controls per-query probe budget and tool selection. On a 100-query four-scene im...

HuggingFace

Revising Context, Shifting Simulated Stance: Auditing LLM-Based Stance Simulation in Online Discussions

Large language models are increasingly used to simulate social media users and infer how individuals may respond to online discussions. However, it remains unclear whether these simulations reflect precise user-specific beliefs or whether they are highly sensitive to semantically independent changes in conversational contexts. In this work, we study counterfactual context revision as a framework for auditing LLM-based stance simulation. Given an original online conversation, we first infer a tar...

HuggingFace

Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?

Video generation models have made impressive strides in synthesizing visually compelling content, yet their outputs remain confined to the virtual domain. A natural question follows: how well do these models reflect the physical world when their generated videos leave the screen and enter reality? We propose robotic manipulation as a concrete, measurable window onto this question: if a model has truly internalized physical laws, the motion it depicts should translate into executable robot behavi...

HuggingFace

Towards Truly Multilingual ASR: Generalizing Code-Switching ASR to Unseen Language Pairs

Automatic Speech Recognition (ASR) has become a key technology for human--AI interaction. However, code-switching ASR (CS-ASR) remains particularly challenging due to the severe scarcity of multilingual CS speech resources across diverse language pairs. Existing approaches primarily improve CS-ASR performance through synthetic CS speech generation or pair-specific fine-tuning on limited bilingual datasets. Nevertheless, these approaches face an inherent scalability limitation, as support for CS ...

HuggingFace

LoomVideo: Unifying Multimodal Inputs into Video Generation and Editing

Developing unified video generation and editing models capable of interpreting interleaved multimodal inputs is a promising yet challenging frontier field. Existing unified frameworks predominantly rely on massive models (typically 13B parameters or more) and incorporate source video conditions for editing by concatenating sequence tokens. This concatenation inevitably doubles the sequence length, quadrupling the computational complexity of the self-attention mechanism and introducing prohibitiv...

HuggingFace

AffordanceVLA: A Vision-Language-Action Model Empowering Action Generation through Affordance-Aware Understanding

Vision-Language-Action (VLA) models leverage the rich world knowledge of pretrained vision-language models (VLMs) to enable instruction-following robotic manipulation. However, the structural mismatch between VLM semantic spaces and embodied control policies often hinders the learning of precise perception--action mappings. To address this challenge, we propose AffordanceVLA, a unified framework that introduces structured affordance forecasting as a task-oriented intermediate representation to e...

HuggingFace

Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction

Video event prediction (VEP) requires models to infer unobserved future states from partial video evidence. Existing video MLLMs usually verbalize intermediate future reasoning in text space: once visual evidence is verbalized, fine-grained motion, geometry, and interaction cues can be lost, leading to plausible but visually ungrounded hallucinations. We introduce Future-L1, an interleaved latent visual reasoning framework that lets an MLLM alternate between language tokens and continuous latent...

HuggingFace

ForeSci: Evaluating LLM Agents for Forward-Looking AI Research Judgment

AI research often requires decisions before future evidence exists: which bottleneck to attack, which direction to pursue, or where a project should be positioned. We introduce ForeSci, a temporally controlled benchmark for evaluating whether LLM agents can make such forward-looking research judgements from historical evidence. ForeSci contains 500 tasks across four fast-moving AI domains and four decision families. Each task is paired with a cutoff-aligned offline knowledge base; post-cutoff pa...

HuggingFace

Learning Geometric Representations from Videos for Spatial Intelligent Multimodal Large Language Models

Multimodal Large Language Models (MLLMs) excel at 2D semantic understanding but lack intrinsic 3D awareness, resulting in representations that fail to maintain geometric and spatial consistency across video frames. Given the scarcity of large-scale 3D data, we present GeoVR, a novel framework that learns geometric representations using purely 2D video sequences. This approach effectively restructures the semantic latent space within MLLMs to unlock spatial intelligence. Rather than employing sup...

HuggingFace

Tutorials

Fine-tuning an LLM to write docs like it's 1995

A developer demonstrates fine-tuning an LLM to generate documentation in the style of 1995 web design and writing conventions. The project showcases creative applications of model customization for nostalgic or unconventional outputs.

RSS

Industry News

The latest AI news we announced in May 2026

Here are Google’s latest AI updates from May 2026

RSS

Meta's ships facial recognition on smart glasses

Meta has integrated facial recognition technology into its smart glasses products for enhanced user identification and features. This deployment raises privacy and ethical considerations regarding biometric data collection.

RSS

South Korean forums will need to scan every images with AI censorship tools

South Korean online forums are being required to implement mandatory AI-powered image scanning and censorship tools to comply with new regulations. This policy aims to monitor and filter prohibited content automatically.

RSS

The Pentagon is running an AI propaganda mill targeting Latin America

The Pentagon has been operating an AI-powered propaganda system designed to target and influence audiences in Latin America through coordinated disinformation campaigns. This initiative raises significant concerns about the militarization of AI and its use in spreading manipulated content.

RSS

NSA using Anthropic's Mythos for cyber attacks

The NSA has reportedly been utilizing Anthropic's Mythos AI system to conduct cyber attacks and enhance offensive cybersecurity operations. This revelation highlights tensions between AI safety commitments and government intelligence agency applications.

RSS

Leak Reveals Microsoft Wants Its AI to Be 'Addictive'

A leaked document shows that Microsoft is explicitly designing its AI systems to be psychologically addictive, incorporating engagement tactics similar to social media platforms. The disclosure raises ethical questions about AI product design and user manipulation.

RSS