Cainew - Curated AI news for developers

TL;DR

Model Releases

Starship V3

Tools & Products

Research Papers

Industry News

Model Releases

Starship V3 represents the latest iteration of SpaceX's fully reusable spacecraft design aimed at enabling rapid, low-cost space transportation. The advancement continues development toward achieving reliable orbital refueling and deep space missions.

Twitter

Tools & Products

mukul975/cve-mcp-server

Production-grade MCP server giving Claude 27 security intelligence tools across 21 APIs — CVE lookup, EPSS scoring, CISA KEV, MITRE ATT&CK, Shodan, VirusTotal, and more.

GitHub

evanklem/evanflow

A TDD-driven iterative feedback loop for software development. 16 cohesive Claude Code skills walk an idea from brainstorm → plan → execute → iterate, with checkpoints throughout.

GitHub

gameworld-project/gameworld

GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents

GitHub

DioCrafts/OpenFoundry

🏭 The open-source Palantir Foundry alternative. Connect any data source, build ontologies, create pipelines, visualize with dashboards, and make AI-powered decisions. Self-hosted. Built with Rust + Svelte.

GitHub

Memoket Gem: An AI wearable that remembers your conversations all day

We’re opening 50 free spots for our Founding Member Program for founders, SMB owners who want to try Memoket Gem early. Memoket Gem is an all-day AI wearable that captures meetings, calls, coffee chats, and decisions on the go. It summarizes key moments, connects context across conversations, and turns them into tasks, notes, and follow-ups in the tools you already use. Join us and help shape the future of real-world AI memory.

ProductHunt

benmaster82/Kwipu

Ask questions across your Markdown notes using a fully local Graph RAG engine. Built for Obsidian vaults, works with any folder of Markdown files. Extracts entity-relation triples from wikilinks & YAML frontmatter, retrieves answers via hybrid search (vector + BM25 + temporal). Multilingual. No cloud. Runs on Ollama.

GitHub

openlake-project/openlake

Hyper efficient storage for GPU workloads. Feed your GPUs at blazing fast speeds.

GitHub

Tomotsugu-dev/Hindsight

Local-first desktop activity tracker — see where your hours go, with on-device AI daily summaries and optional multi-device sync

GitHub

CraftBot with Living UI: Grow your own software that is alive.

Living UI is a brand-new system that lets CraftBot (general AI agent) build, import, or evolve custom apps/dashboards that live inside CraftBot itself. The agent stays context-aware of the Living UI's state and can read, write, and act on its Living UI directly. A Living UI is never "finished". Ask CraftBot to add features or redesign a view as your needs grow. Living UI turns software from something users buy and adapt to into something CraftBot creates and adapts around them.

ProductHunt

Frontdesk AI: AI COO to run your business like a Fortune 500 enterprise

Enter your website and get all the AI agents you need to grow your business. One AI that calls, texts, and emails all of your customers 24/7. A CRM, a ticketing system, even a website builder.

ProductHunt

Googlebook: A new kind of laptop designed for Gemini Intelligence

A new category of laptops built from the ground up for Gemini intelligence. These devices feature the Magic Pointer for contextual suggestions and custom widgets to help you organize your tasks. Keep an eye on googlebook.com for more updates before the devices launch this fall.

ProductHunt

Blaze 2.0: AI marketer for SMBs complete w/ strategy, content, and ads

Blaze 2.0 is the marketing solution for people who don't have time to do marketing. It learns your business, your audience, and your voice — then creates and manages your entire content strategy, automatically. Like having a full-time marketer on your team without the salary.

ProductHunt

Liminary: Ground your AI in saved knowledge as you work

Liminary turns everything you’ve saved into working memory for AI. Unlike chatbots, meeting tools, or project-based notebooks, it gives your knowledge one shared memory across writing, meetings, and research. It surfaces relevant context automatically as you work, helping expert knowledge workers reuse their best thinking, avoid starting from scratch, and produce source-grounded work with traceable citations.

ProductHunt

Pipali: An AI coworker for any computer work

Pipali is an AI coworker that lives on your computer. It interacts with your files, browser and apps to get real work done. Pipali can handle most computer work — deep research, polished docs, browser tasks and routine errands. Teach it your workflows with Skills, run recurring tasks with Routines and integrate with your apps like Linear, Slack and GitHub via MCP.

ProductHunt

Using OR-Tools CP-SAT for Scheduling Problems

OR-Tools CP-SAT is a constraint programming solver that can be effectively used to tackle complex scheduling problems in operations research. It provides a powerful optimization tool for solving real-world scheduling challenges efficiently.

RSS

Research Papers

Beyond Semantic Similarity

Beyond Semantic Similarity explores advanced methods for understanding and comparing meaning in text that go beyond traditional similarity metrics. It addresses the limitations of conventional semantic analysis approaches in capturing nuanced relationships between concepts.

ArXiv

A Causal Language Modeling Detour Improves Encoder Continued Pretraining

When adapting an encoder to a new domain, the standard approach is to continue training with Masked Language Modeling (MLM). We show that temporarily switching to Causal Language Modeling (CLM) followed by a short MLM decay improves downstream performance. On biomedical texts with ModernBERT, this CLM detour outperforms MLM baselines trained on identical data and compute across 8 French and 11 English biomedical tasks, by +1.2-2.8pp and +0.3-0.8pp respectively, depending on model size. We invest...

HuggingFace

Multi-Stream LLMs: Unblocking Language Models with Parallel Streams of Thoughts, Inputs and Outputs

The continued improvements in language model capability have unlocked their widespread use as drivers of autonomous agents, for example in coding or computer use applications. However, the core of these systems has not changed much since early instruction-tuned models like ChatGPT. Even advanced AI agents function on message exchange formats, successively exchanging messages with users, systems, with itself (i.e. chain-of-thought) and tools in a single stream of computation. This bottleneck to a...

HuggingFace

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

Long-term memory is crucial for agents in specialized web environments, where success depends on recalling interface affordances, state dynamics, workflows, and recurring failure modes. However, existing memory benchmarks for agents mostly focus on user histories, short traces, or downstream task success, leaving open how to directly evaluate whether memory systems effectively internalize environment-specific experience. To address this gap, we introduce LongMemEval-V2 (LME-V2), a benchmark for ...

HuggingFace

L2P: Unlocking Latent Potential for Pixel Generation

Pixel diffusion models have recently regained attention for visual generation. However, training advanced pixel-space models from scratch demands prohibitive computational and data resources. To address this, we propose the Latent-to-Pixel (L2P) transfer paradigm, an efficient framework that directly harnesses the rich knowledge of pre-trained LDMs to build powerful pixel-space models. Specifically, L2P discards the VAE in favor of large-patch tokenization and freezes the source LDM's intermedia...

HuggingFace

MEME: Multi-entity & Evolving Memory Evaluation

LLM-based agents increasingly operate in persistent environments where they must store, update, and reason over information across many sessions. While prior benchmarks evaluate only single-entity updates, MEME defines six tasks spanning the full space defined by the multi-entity and evolving axes, including three not scored by prior work: Cascade and Absence (dependency reasoning) and Deletion (post-removal state). Evaluating six memory systems spanning three memory paradigms on 100 controlled ...

HuggingFace

Agent-BRACE: Decoupling Beliefs from Actions in Long-Horizon Tasks via Verbalized State Uncertainty

Large language models (LLMs) are increasingly deployed on long-horizon tasks in partially observable environments, where they must act while inferring and tracking a complex environment state over many steps. This leads to two challenges: partial observability requires maintaining uncertainty over unobserved world attributes, and long interaction history causes context to grow without bound, diluting task-relevant information. A principled solution to both challenges is a belief state: a posteri...

HuggingFace

Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation

We introduce Pion, a spectrum-preserving optimizer for large language model (LLM) training based on orthogonal equivalence transformation. Unlike additive optimizers such as Adam and Muon, Pion updates each weight matrix through left and right orthogonal transformations, preserving its singular values throughout training. This yields an optimization mechanism that modulates the geometry of weight matrices while keeping their spectral norm fixed. We derive the Pion update rule, systematically exa...

HuggingFace

AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

In this paper, we propose AlphaGRPO, a novel framework that applies Group Relative Policy Optimization (GRPO) to AR-Diffusion Unified Multimodal Models (UMMs) to enhance multimodal generation capabilities without an additional cold-start stage. Our approach unlocks the model's intrinsic potential to perform advanced reasoning tasks: Reasoning Text-to-Image Generation, where the model actively infers implicit user intents, and Self-Reflective Refinement, where it autonomously diagnoses and correc...

HuggingFace

World Action Models: The Next Frontier in Embodied AI

Vision-Language-Action (VLA) models have achieved strong semantic generalization for embodied policy learning, yet they learn reactive observation-to-action mappings without explicitly modeling how the physical world evolves under intervention. A growing body of work addresses this limitation by integrating world models, predictive models of environment dynamics, into the action generation pipeline. We term this emerging paradigm World Action Models (WAMs): embodied foundation models that unify ...

HuggingFace

Debiased Model-based Representations for Sample-efficient Continuous Control

Model-based representations recently stand out as a promising framework that embeds latent dynamics information into the representations for downstream off-policy actor-critic learning. It implicitly combines the advantages of both model-free and model-based approaches while avoiding the training costs associated with model-based methods. Nevertheless, existing model-based representation methods can fail to capture sufficient information about relevant variables and can overfit to early experien...

HuggingFace

One Turn Too Late: Response-Aware Defense Against Hidden Malicious Intent in Multi-Turn Dialogue

Hidden malicious intent in multi-turn dialogue poses a growing threat to deployed large language models (LLMs). Rather than exposing a harmful objective in a single prompt, increasingly capable attackers can distribute their intent across multiple benign-looking turns. Recent studies show that even modern commercial models with advanced guardrails remain vulnerable to such attacks despite advances in safety alignment and external guardrails. In this work, we address this challenge by detecting t...

HuggingFace

On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment

Tool-using LLM agents fail through trajectories rather than only final responses, as they may execute unsafe tool calls, follow injected instructions, comply with harmful requests, or over-refuse benign tasks despite producing a seemingly safe answer. Existing safety-alignment signals are largely response-level or off-policy, and often incur a safety-utility trade-off: improving agent safety comes at the cost of degraded task performance. Such sparse and single-objective rewards severely limit r...

HuggingFace

Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction

Transformer-based 3D reconstruction has emerged as a powerful paradigm for recovering geometry and appearance from multi-view observations, offering strong performance across challenging visual conditions. As these models scale to larger backbones and higher-resolution inputs, improving their efficiency becomes increasingly important for practical deployment. However, modern 3D transformer pipelines face two coupled challenges: dense multi-view attention creates substantial token-mixing overhead...

HuggingFace

VidSplat: Gaussian Splatting Reconstruction with Geometry-Guided Video Diffusion Priors

Gaussian Splatting has achieved remarkable progress in multi-view surface reconstruction, yet it exhibits notable degradation when only few views are available. Although recent efforts alleviate this issue by enhancing multi-view consistency to produce plausible surfaces, they struggle to infer unseen, occluded, or weakly constrained regions beyond the input coverage. To address this limitation, we present VidSplat, a training-free generative reconstruction framework that leverages powerful vide...

HuggingFace

Industry News

50K Tahoe residents need power as utility eyes redirecting lines to data centers

A utility company is considering redirecting power lines away from Tahoe residents to supply data centers, potentially leaving 50,000 residents without adequate power. This decision highlights the growing conflict between energy demands from AI and tech infrastructure versus residential communities' needs.

RSS

The US is winning the AI race where it matters most: commercialization

The United States maintains a competitive edge in artificial intelligence by achieving greater commercial success and market adoption compared to other nations. This advantage is demonstrated through the widespread deployment of AI technologies across various industries and the growth of AI-driven businesses.

RSS