Cainew - Curated AI news for developers

TL;DR

Model Releases

Tools & Products

Research Papers

PersonalAI 2.0: Enhancing knowledge graph traversal/retrieval with planning mechanism for Personalized LLM Agents
RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation
RoboEvolve: Co-Evolving Planner-Simulator for Robotic Manipulation with Limited Data

Industry News

Discussion

The other half of AI safety

Model Releases

snwy/SD1.5-DALLE-2

This appears to be a reference to AI image generation models, combining Stable Diffusion 1.5 with DALL-E 2 technology. The item likely discusses advancements or comparisons in AI-powered image synthesis capabilities.

HuggingFace

Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context — Best Sub-100M Retrieval Quality

Granite Embedding Multilingual R2 is an open-source, Apache 2.0 licensed embedding model supporting multiple languages with a 32K token context window. This multilingual embedding solution enables more comprehensive semantic understanding across languages.

HuggingFace

Tools & Products

nduckmink/arkon

Arkon: Enterprise AI Knowledge Hub & MCP Server. Self-hosted knowledge base for teams to manage RAG contexts, access policies, and AI skills. Connect Claude and other LLMs via Model Context Protocol (MCP) for automated, secure organizational knowledge integration.

GitHub

nexu-io/html-anything

✨ The agentic HTML editor — your local AI agent writes the HTML, you ship it. 🚀 75 Skills × 9 Surfaces (magazine · deck · poster · XHS / tweet · prototype · data report · Hyperframes) 🛡️ Sandboxed preview · 📤 1-click to WeChat / X / Zhihu / HTML / PNG 🔑 Zero API key — Claude Code / Cursor / Codex / Gemini / Copilot / OpenCode / Qwen / Aider.

GitHub

sno-ai/llmix

Production LLM call layer for AI agents and tools: keep OpenAI/Anthropic/AI SDK/LiteLLM, hot-swap models with MDA presets, and add cache, retries, circuit breakers, key rotation, singleflight, and Python/TypeScript/Rust parity.

GitHub

Siigari/claude-heartbeat

Keep Claude Code alive and autonomous without -p. Heartbeat hook + inbox/outbox pattern.

GitHub

ndcorder/outputguard

Validate, repair, and retry LLM structured outputs. 13 repair strategies for common JSON malformations, JSON Schema validation, and retry-with-feedback prompts.

GitHub

Spellar 3.0: AI Meeting companion with cross-meeting memory

Most meeting tools give you notes. Spellar AI gives you memory. It joins your calls, captures every word, and builds context across all your meetings. Ask what a client said three calls ago. Find decisions from last week. See what’s still open. Organize by client, use templates, and choose the AI you trust — OpenAI, Anthropic, Perplexity, Gemini and more!

ProductHunt

Naptick AI: Al sleep companion that helps fall asleep without struggle

Naptick is a smart bedside AI sleep companion designed for founders, professionals, light sleepers, and anyone struggling with nighttime stress or doomscrolling. It combines circadian light therapy, 1000+ adaptive soundscapes, room condition intelligence, app-locking, and an on-device AI sleep coach to help users fall asleep faster and wake up refreshed. Unlike passive sleep trackers, Naptick is built phone-free by design and actively helps improve sleep before the night begins.

ProductHunt

lucidrains/d4rt

Implementation of D4RT, Efficiently Reconstructing Dynamic Scenes, from Deepmind

GitHub

Tendem by Toloka: AI platform to hand off any task to a human expert

Tendem is a platform where human experts and AI agents complete high-stakes tasks. Submit a task in plain language. AI agents handle the volume. Human experts level up the the final output. What comes back is complete, accurate, and ready to act on. Built by Toloka.ai, a company that has spent more than a decade building human-in-the-loop quality systems for frontier AI labs. Trusted by founders, operators, and AI-native users who need reliable results.

ProductHunt

Causo for Fundraising: Pitch the right VCs, skip the grind

90% of startups die from no money, not bad products. Causo's AI agents find matching investors and email them for you while you ship. Upload your deck or website, get matched with specific partners at relevant VC funds, send your pitch. All on autopilot. Let our raccoons do the work while you ship product and talk to customers.

ProductHunt

Notion Developer Platform: Build on Notion, not just inside it

Notion Developer Platform lets teams build on Notion with CLI, Workers, database syncs, agent tools, webhook triggers, MCP, and External Agents APIs, so data, workflows, and agents can operate inside the same shared workspace.

ProductHunt

Fei Design Mode : Directly edit and tweak UI pixels live with AI agents

Design Mode gives designers direct ownership of what ships. Point to any element in the live preview, tweak styles visually, and push straight to your codebase from Figma or Claude Design. No handoff. No translation layer. What you designed is what ships. Finally, real superpowers for designers.

ProductHunt

Asteroid: Build Browser, Linux and Windows AI agents in seconds

Asteroid lets ops teams and engineers build computer-use agents for browser, Linux, and Windows workflows in minutes. Our meta-agent, Astro, builds the agents, writes scripts as it goes, and makes repeat runs faster and cheaper. Last month, Asteroid agents completed 150,000+ executions across EHRs, benefits portals, insurance carriers, Citrix, desktop apps, and VPN-protected environments.

ProductHunt

Arena AI Model ELO History

Arena AI Model ELO History tracks the performance rankings of various AI models over time using ELO rating systems. The data provides insights into how different models have competed and evolved in capability benchmarks.

RSS

Helping ChatGPT better recognize context in sensitive conversations

Learn how new ChatGPT safety updates improve context awareness in sensitive conversations, helping detect risk over time and respond more safely.

OpenAI

Research Papers

PersonalAI 2.0: Enhancing knowledge graph traversal/retrieval with planning mechanism for Personalized LLM Agents

We introduce PersonalAI 2.0 (PAI-2), a novel framework, designed to enhance large language model (LLM) based systems through integration of external knowledge graphs (KG). The proposed approach addresses key limitations of existing Graph Retrieval-Augmented Generation (GraphRAG) methods by incorporating a dynamic, multistage query processing pipeline. The central point of PAI-2 design is its ability to perform adaptive, iterative information search, guided by extracted entities, matched graph ve...

HuggingFace

RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation

Intensive care units (ICU) generate long, dense and evolving streams of clinical information, where physicians must repeatedly reassess patient states under time pressure, underscoring a clear need for reliable AI decision support. Existing ICU benchmarks typically treat historical clinician actions as ground truth. However, these actions are made under incomplete information and limited temporal context of the underlying patient state, and may therefore be suboptimal, making it difficult to ass...

HuggingFace

RoboEvolve: Co-Evolving Planner-Simulator for Robotic Manipulation with Limited Data

The scalability of robotic manipulation is fundamentally bottlenecked by the scarcity of task-aligned physical interaction data. While vision-language models (VLMs) and video generation models (VGMs) hold promise for autonomous data synthesis, they suffer from semantic-spatial misalignment and physical hallucinations, respectively. To bridge this gap, we introduce RoboEvolve, a novel framework that couples a VLM planner and a VGM simulator into a mutually reinforcing co-evolutionary loop. Operat...

HuggingFace

Many-Shot CoT-ICL: Making In-Context Learning Truly Learn

In-context learning (ICL) adapts large language models (LLMs) to new tasks by conditioning on demonstrations in the prompt without parameter updates. With long-context models, many-shot ICL can use dozens to hundreds of examples and achieve performance comparable to fine-tuning, yet current understanding of its scaling behavior is largely derived from non-reasoning tasks. We study many-shot chain-of-thought in-context learning (CoT-ICL) for reasoning and show that standard many-shot rules do not...

HuggingFace

Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) has become a standard approach for knowledge-intensive question answering, but existing systems remain brittle on multi-hop questions, where solving the task requires chaining multiple retrieval and reasoning steps. Key challenges are that current methods represent reasoning through free-form natural language, where intermediate states are implicit, retrieval queries can drift from intended entities, and errors are detected by the same model that produces the...

HuggingFace

MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning

Current interactive LLM agents rely on goal-conditioned stepwise planning, where environmental understanding is acquired reactively during execution rather than established beforehand. This temporal inversion leads to Delayed Environmental Perception: agents must infer environmental constraints through trial-and-error, resulting in an Epistemic Bottleneck that traps them in inefficient failure cycles. Inspired by human affordance perception and cognitive map theory, we propose the Map-then-Act P...

HuggingFace

FrameSkip: Learning from Fewer but More Informative Frames in VLA Training

Vision-Language-Action (VLA) policies are commonly trained from dense robot demonstration trajectories, often collected through teleoperation, by sampling every recorded frame as if it provided equally useful supervision. We argue that this convention creates a temporal supervision imbalance: long low-change segments dominate the training stream, while manipulation-critical transitions such as alignment, contact, grasping, and release appear only sparsely. We introduce FrameSkip, a data-layer fr...

HuggingFace

EVA-Bench: A New End-to-end Framework for Evaluating Voice Agents

Voice agents, artificial intelligence systems that conduct spoken conversations to complete tasks, are increasingly deployed across enterprise applications. However, no existing benchmark jointly addresses two core evaluation challenges: generating realistic simulated conversations, and measuring quality across the full scope of voice-specific failure modes. We present EVA-Bench, an end-to-end evaluation framework that addresses both. On the simulation side, EVA-Bench orchestrates bot-to-bot aud...

HuggingFace

Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context

Long-context modeling is becoming a core capability of modern large vision-language models (LVLMs), enabling sustained context management across long-document understanding, video analysis, and multi-turn tool use in agentic workflows. Yet practical training recipes remain insufficiently explored, particularly for designing and balancing long-context data mixtures. In this work, we present a systematic study of long-context continued pre-training for LVLMs, extending a 7B model from 32K to 128K ...

HuggingFace

AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation

Evaluation of software engineering (SWE) agents is dominated by a binary signal: whether the final patch passes the tests. This outcome-only view treats a principled solution and a chaotic trial-and-error process as equivalent. We show that this equivalence is empirically false. We evaluate 2,614 OpenHands trajectories from eight model backends on 60 SWE-bench Verified tasks. Of these, 47 have enough passing trajectories to construct task-level process references, yielding a 1,815-trajectory eva...

HuggingFace

Asymmetric Flow Models

Flow-based generation in high-dimensional spaces is difficult because velocity prediction requires modeling high-dimensional noise, even when data has strong low-rank structure. We present Asymmetric Flow Modeling (AsymFlow), a rank-asymmetric velocity parameterization that restricts noise prediction to a low-rank subspace while keeping data prediction full-dimensional. From this asymmetric prediction, AsymFlow analytically recovers the full-dimensional velocity without changing the network arch...

HuggingFace

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

Few-step video generation has been significantly advanced by consistency distillation. However, the performance of consistency-distilled models often degrades as more sampling steps are allocated at test time, limiting their effectiveness for any-step video diffusion. This limitation arises because consistency distillation replaces the original probability-flow ODE trajectory with a consistency-sampling trajectory, weakening the desirable test-time scaling behavior of ODE sampling. To address th...

HuggingFace

Useful Memories Become Faulty When Continuously Updated by LLMs

Learning from past experience benefits from two complementary forms of memory: episodic traces -- raw trajectories of what happened -- and consolidated abstractions distilled across many episodes into reusable, schema-like lessons. Recent agentic-memory systems pursue the consolidated form: an LLM rewrites past trajectories into a textual memory bank that it continuously updates with new interactions, promising self-improving agents without parameter updates. Yet we find that such consolidated m...

HuggingFace

IndicMedDialog: A Parallel Multi-Turn Medical Dialogue Dataset for Accessible Healthcare in Indic Languages

Most existing medical dialogue systems operate in a single-turn question--answering paradigm or rely on template-based datasets, limiting conversational realism and multilingual applicability. We introduce IndicMedDialog, a parallel multi-turn medical dialogue dataset spanning English and nine Indic languages: Assamese, Bengali, Gujarati, Hindi, Marathi, Punjabi, Tamil, Telugu, and Urdu. The dataset extends MDDial with LLM-generated synthetic consultations, translated using TranslateGemma, verif...

HuggingFace

F-GRPO: Factorized Group-Relative Policy Optimization for Unified Candidate Generation and Ranking

Traditional retrieval pipelines optimize utility through stages of candidate retrieval and reranking, where ranking operates over a predefined candidate set. Large Language Models (LLMs) broaden this into a generative process: given a candidate pool, an LLM can generate a subset and order it within a single autoregressive pass. However, this flexibility introduces a new optimization challenge: the model must search a combinatorial output space while receiving utility feedback only after the full...

HuggingFace

Industry News

Sam Altman's Business Dealings Under GOP Scrutiny Ahead of OpenAI's IPO

Sam Altman's business activities are facing scrutiny from GOP members as OpenAI prepares for a potential IPO. The examination raises questions about potential conflicts of interest ahead of the company's public offering.

RSS

Anthropic forms $200M partnership with the Gates Foundation

Anthropic has announced a $200 million partnership with the Gates Foundation to advance AI development and research initiatives. This collaboration aims to leverage AI for addressing global challenges and societal impact.

Anthropic

Medicare's new payment model is built for AI. Most of the tech world has no idea

Medicare has implemented a new payment model designed with AI capabilities in mind, but many technology companies remain largely unaware of this opportunity. The development represents significant potential for AI integration in healthcare reimbursement systems.

RSS

The Whole Anthropic Kerfuffle

This discusses recent controversies and disputes involving Anthropic and its leadership or policies. The piece provides analysis of the key issues and disagreements within or surrounding the company.

Twitter

May 14, 2026Policy2028: Two scenarios for global AI leadership

Anthropic

Discussion

The other half of AI safety

This piece explores often-overlooked aspects of AI safety beyond technical alignment concerns. It highlights the importance of institutional, social, and deployment-related safety considerations in AI development.

RSS