Cainew - Curated AI news for developers

TL;DR

Model Releases

internlm/Intern-S2-Preview

Tools & Products

Research Papers

Tutorials

How Claude Code works in large codebases

Industry News

Discussion

Model Releases

internlm/Intern-S2-Preview

Intern-S2-Preview is a multimodal AI model from InternLM that processes both vision and language inputs for advanced understanding and generation tasks. This preview demonstrates progress in creating versatile AI systems capable of handling diverse data modalities.

HuggingFace

Tools & Products

nexu-io/open-design

🎨 Local-first, open-source alternative to Anthropic's Claude Design. ⚡ 19 Skills · ✨ 71 brand-grade Design Systems 🖼 Generate web · desktop · mobile prototypes · slides · images · videos · HyperFrames 📦 Sandboxed preview · HTML/PDF/PPTX/MP4 export 🤖 Runs on Claude Code / Codex / Cursor / Gemini / OpenCode / Qwen / Copilot / Hermes / Kimi CLI.

GitHub

browser-use/browser-harness

Browser Harness | Self-healing harness that enables LLMs to complete any task.

GitHub

YouMind-OpenLab/awesome-gpt-image-2

🚀 World's largest GPT Image 2 prompt library, updated daily — 2000+ curated prompts with preview images, 16 languages. OpenAI's next-gen image model with pixel-perfect text rendering, cross-image consistency, and commercial-grade illustration. Free & open source.

GitHub

esengine/DeepSeek-Reasonix

DeepSeek-native AI coding agent for your terminal. Engineered around prefix-cache stability — leave it running.

GitHub

lightseekorg/tokenspeed

TokenSpeed is a speed-of-light LLM inference engine.

GitHub

future-agi/future-agi

Open-source, end-to-end platform for evaluating, observing, and improving LLM and AI agent applications. Tracing · Evals · Simulations · Datasets · Gateway · Guardrails. Self-hostable. Apache 2.0.

GitHub

jmerelnyc/Photo-agents

Autonomous self-evolving agents. Vision-grounded layered memory and self-written skills for LLM agents that operate your computer.

GitHub

alvinunreal/openpets

Desktop pets for AI coding agents. Install pets, connect Claude Code via MCP, and see live coding status on your desktop.

GitHub

smaramwbc/statewave

Open-source memory runtime for AI agents — reproducible, provenance-tagged context bundles instead of query-time retrieval. Apache-2.0, self-hosted on Postgres + pgvector, Python + TypeScript SDKs.

GitHub

Show HN: Find the best local LLM for your hardware, ranked by benchmarks

A tool that helps users find the optimal local LLM for their hardware by comparing performance benchmarks. It streamlines the process of selecting efficient models tailored to specific hardware configurations.

GitHub

Claude for Legal

Claude, an AI assistant, is now being adapted for legal applications and can help with legal research, document review, and analysis. This specialized implementation demonstrates AI's growing role in professional services.

GitHub

NeuralInverse/neuralinverse

Code Modern. Code Legacy. Code Firmware. - open-source AI-native IDE with agentic coding, Power Mode, legacy modernization, and firmware development

GitHub

MemTensor/MemPrivacy

MemPrivacy is a privacy-preserving personalized memory management framework for edge-cloud agents.

GitHub

HasData: Web scraping service for AI agents

HasData is the managed web scraping service for data pipelines and AI agents. Send any URL, get clean JSON or Markdown back in one API call. We handle proxies, browser rendering, retries, and anti-bot. 50+ ready scrapers cover Google Search, Maps, News, Zillow, Indeed, and major e-commerce. AI extraction handles any other URL from a plain-text prompt. Use it from Claude, ChatGPT, or your own AI agent via MCP. CLI for everything else.

ProductHunt

Lensmor: Turn exhibitor data into pre-booked sales meetings

Unlike generic contact databases, Lensmor starts with exhibitor data, helping teams discover relevant events, find exhibiting companies, identify decision-makers, reveal verified emails, and book meetings before the show begins. Standout features include 160,000+ global events, exhibitor search, reverse company-to-event lookup, CSV export, and an AI agent for lead discovery and outreach planning.

ProductHunt

Research Papers

The sigmoids won't save you

This piece argues that sigmoid activation functions, commonly used in neural networks, are not sufficient safeguards against AI failures or misalignment. The title suggests mathematical tricks alone cannot solve fundamental AI safety challenges.

RSS

Dynamic Latent Routing

We investigate the temporal concatenation of sub-policies in Markov Decision Processes (MDP) with time-varying reward functions. We introduce General Dijkstra Search (GDS), and prove that globally optimal goal-reaching policies can be recovered through temporal composition of intermediate optimal sub-policies. Motivated by the "search, select, update" principle underlying GDS, we propose Dynamic Latent Routing (DLR), a language-model post-training method that jointly learns discrete latent codes...

HuggingFace

LiSA: Lifelong Safety Adaptation via Conservative Policy Induction

As AI agents move from chat interfaces to systems that read private data, call tools, and execute multi-step workflows, guardrails become a last line of defense against concrete deployment harms. In these settings, guardrail failures are no longer merely answer-quality errors: they can leak secrets, authorize unsafe actions, or block legitimate work. The hardest failures are often contextual: whether an action is acceptable depends on local privacy norms, organizational policies, and user expect...

HuggingFace

Quantitative Video World Model Evaluation for Geometric-Consistency

Generative video models are increasingly studied as implicit world models, yet evaluating whether they produce physically plausible 3D structure and motion remains challenging. Most existing video evaluation pipelines rely heavily on human judgment or learned graders, which can be subjective and weakly diagnostic for geometric failures. We introduce PDI-Bench (Perspective Distortion Index), a quantitative framework for auditing geometric coherence in generated videos. Given a generated clip, we ...

HuggingFace

Beyond Individual Intelligence: Surveying Collaboration, Failure Attribution, and Self-Evolution in LLM-based Multi-Agent Systems

LLM-based autonomous agents have demonstrated strong capabilities in reasoning, planning, and tool use, yet remain limited when tasks require sustained coordination across roles, tools, and environments. Multi-agent systems address this through structured collaboration among specialized agents, but tighter coordination also amplifies a less explored risk: errors can propagate across agents and interaction rounds, producing failures that are difficult to diagnose and rarely translate into structu...

HuggingFace

Warp-as-History: Generalizable Camera-Controlled Video Generation from One Training Video

Camera-controlled video generation has made substantial progress, enabling generated videos to follow prescribed viewpoint trajectories. However, existing methods usually learn camera-specific conditioning through camera encoders, control branches, or attention and positional-encoding modifications, which often require post-training on large-scale camera-annotated videos. Training-free alternatives avoid such post-training, but often shift the cost to test-time optimization or extra denoising-ti...

HuggingFace

PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation

Generating realistic human motion is a central yet unsolved challenge in video generation. While reinforcement learning (RL)-based post-training has driven recent gains in general video quality, extending it to human motion remains bottlenecked by a reward signal that cannot reliably score motion realism. Existing video rewards primarily rely on 2D perceptual signals, without explicitly modeling the 3D body state, contact, and dynamics underlying articulated human motion, and often assign high s...

HuggingFace

Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models

Vision-Language-Action (VLA) models achieve remarkable flexibility and generalization beyond classical control paradigms. However, most prevailing VLAs are trained under a single-frame observation paradigm, which leaves them structurally blind to temporal dynamics. Consequently, these models degrade severely in non-stationary scenarios, even when trained or finetuned on dynamic datasets. Existing approaches either require expensive retraining or suffer from latency bottlenecks and poor temporal ...

HuggingFace

MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models

Memory is essential for large vision-language models (LVLMs) to handle long, multimodal interactions, with two method directions providing this capability: long-context LVLMs and memory-augmented agents. However, no existing benchmark conducts a systematic comparison of the two on questions that genuinely require multimodal evidence. To close this gap, we introduce MEMLENS, a comprehensive benchmark for memory in multimodal multi-session conversations, comprising 789 questions across five memory...

HuggingFace

Orchard: An Open-Source Agentic Modeling Framework

Agentic modeling aims to transform LLMs into autonomous agents capable of solving complex tasks through planning, reasoning, tool use, and multi-turn interaction with environments. Despite major investment, open research remains constrained by infrastructure and training gaps. Many high-performing systems rely on proprietary codebases, models, or services, while most open-source frameworks focus on orchestration and evaluation rather than scalable agent training. We present Orchard, an open-sour...

HuggingFace

RAVEN: Real-time Autoregressive Video Extrapolation with Consistency-model GRPO

Causal autoregressive video diffusion models support real-time streaming generation by extrapolating future chunks from previously generated content. Distilling such generators from high-fidelity bidirectional teachers yields competitive few-step models, yet a persistent gap between the history distributions encountered during training and those arising at inference constrains generation quality over long horizons. We introduce the Real-time Autoregressive Video Extrapolation Network (RAVEN), a ...

HuggingFace

Nexus : An Agentic Framework for Time Series Forecasting

Time series forecasting is not just numerical extrapolation, but often requires reasoning with unstructured contextual data such as news or events. While specialized Time Series Foundation Models (TSFMs) excel at forecasting based on numerical patterns, they remain unaware to real-world textual signals. Conversely, while LLMs are emerging as zero-shot forecasters, their performance remains uneven across domains and contextual grounding. To bridge this gap, we introduce Nexus, a multi-agent forec...

HuggingFace

SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer

We introduce SANA-WM, an efficient 2.6B-parameter open-source world model natively trained for one-minute generation, synthesizing high-fidelity, 720p, minute-scale videos with precise camera control. SANA-WM achieves visual quality comparable to large-scale industrial baselines such as LingBot-World and HY-WorldPlay, while significantly improving efficiency. Four core designs drive our architecture: (1) Hybrid Linear Attention combines frame-wise Gated DeltaNet (GDN) with softmax attention for ...

HuggingFace

Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image

Generating a street-level 3D scene from a single satellite image is a crucial yet challenging task. Current methods present a stark trade-off: geometry-colorization models achieve high geometric fidelity but are typically building-focused and lack semantic diversity. In contrast, proxy-based models use feed-forward image-to-3D frameworks to generate holistic scenes by jointly learning geometry and texture, a process that yields rich content but coarse and unstable geometry. We attribute these ge...

HuggingFace

FutureSim: Replaying World Events to Evaluate Adaptive Agents

AI agents are being increasingly deployed in dynamic, open-ended environments that require adapting to new information as it arrives. To efficiently measure this capability for realistic use-cases, we propose building grounded simulations that replay real-world events in the order they occurred. We build FutureSim, where agents forecast world events beyond their knowledge cutoff while interacting with a chronological replay of the world: real news articles arriving and questions resolving over t...

HuggingFace

Tutorials

How Claude Code works in large codebases

Claude Code introduces capabilities for understanding and working with large codebases through advanced context management and code comprehension. This enables developers to handle more complex projects with AI assistance.

RSS

Industry News

Elevated error rates on Opus 4.7

Claude Opus 4.7 has been experiencing elevated error rates, indicating potential performance degradation or reliability issues with this model version. Users may be encountering more frequent failures or inconsistencies.

RSS

New arXiv policy: 1-year ban for hallucinated references

arXiv has implemented a new policy that bans researchers for one year if they submit papers containing hallucinated or fabricated references. This enforcement aims to maintain academic integrity and combat the spread of misinformation in scientific literature.

Twitter

Ontario auditors find doctors' AI note takers routinely blow basic facts

Ontario auditors discovered that AI-powered note-taking tools used by doctors frequently make significant errors in recording basic medical facts. This finding raises serious concerns about the reliability of AI assistants in healthcare settings.

RSS

Amazon workers under pressure to up their AI usage are making up tasks

Amazon workers are reportedly fabricating tasks to meet pressure from management to increase their use of AI tools in the workplace. This highlights concerns about artificial quotas and employee well-being under productivity mandates.

RSS

UK sovereign LLM inference

The UK is developing sovereign LLM inference capabilities to ensure independent and secure language model deployment within national infrastructure. This initiative aims to reduce reliance on foreign AI providers.

RSS

Welcome to the Strip Mining Era of OSS Security

The tech industry is entering a Strip Mining Era of open-source software security, where developers are extracting value from OSS without adequately maintaining or securing it. This unsustainable approach threatens the foundation of modern software infrastructure.

RSS

Have a Coherent AI Policy

A discussion on the importance of establishing clear, consistent AI policies across organizations to ensure responsible development and deployment. Having a coherent policy framework helps align AI initiatives with organizational values.

RSS

Discussion

Access to frontier AI will soon be limited by economic and security constraints

Access to cutting-edge AI models will increasingly be restricted by economic costs and security concerns rather than open availability. This shift suggests that frontier AI capabilities will become concentrated among well-resourced organizations.

RSS

“Too dangerous to release” or just too expensive?

This article examines whether certain AI models are withheld from release due to genuine safety concerns or primarily because of economic considerations around deployment costs. It questions the true motivations behind restricting access to advanced AI systems.

RSS

Sea's View on the Future of Agentic Software Development with Codex

Sea Limited's CPO explains why the company is deploying Codex across engineering teams to accelerate AI-native software development in Asia.

OpenAI