Cainew

Curated AI news for developers

February 8, 2026 Weekly

TL;DR

Model Releases

openbmb/MiniCPM-o-4_5 is a compact pre-trained language model that can be efficiently used for a variety of natural language processing tasks.

HuggingFace

FutureMa/Eva-4B-V2 is a 4 billion parameter language model with improved performance on various natural language processing benchmarks.

HuggingFace

Comfy-Org/ace_step_1.5_ComfyUI_files contains files for a user interface designed to make it easier to work with large language models and other AI tools.

HuggingFace

meituan-longcat/LongCat-Image-Edit-Turbo is a tool that allows for efficient and high-quality image editing and manipulation using large language models.

HuggingFace

unsloth/Qwen3-Coder-Next-FP8-Dynamic is a new AI-powered coding assistant that uses dynamic programming techniques to help developers write more efficient code.

HuggingFace

OpenAI Frontier is an enterprise platform for building, deploying, and managing AI agents with shared context, onboarding, permissions, and governance.

OpenAI

GPT‑5.3-Codex is the most capable agentic coding model to date, combining the frontier coding performance of GPT‑5.2-Codex with the reasoning and professional knowledge capabilities of GPT‑5.2.

OpenAI

Tools & Products

Samurai-inspired multi-agent system for Claude Code. Orchestrate parallel AI tasks via tmux with shogun → karo → ashigaru hierarchy.

GitHub

Supercharge AI coding agents with portable skills. Install, translate & share skills across Claude Code, Cursor, Codex, Copilot & 28 more

GitHub

A lightweight, highly extensible ai code agent, built in Rust.

GitHub

The Ultimate Collection of 700+ Agentic Skills for Claude Code/Antigravity/Cursor. Battle-tested, high-performance skills for AI agents including official skills from Anthropic and Vercel.

GitHub

AI Tool Use API for Anima anime/illustration image generation. Supports MCP Server, HTTP API, and CLI.

GitHub

Smart LLM router — save 78% on inference costs. 30+ models, one wallet, x402 micropayments.

GitHub

The High-Performance Python Web Framework. The simplicity of Streamlit, minus the reruns

GitHub

Research Papers

The Waymo World Model is a large-scale map and simulation environment for training self-driving car AI.

RSS

Large language models (LLMs) are increasingly evaluated in interactive environments to test their social intelligence. However, existing benchmarks often assume idealized communication between agents, limiting our ability to diagnose whether LLMs can maintain and repair interactions in more realistic, imperfect settings. To close this gap, we present SocialVeil, a social learning environment that can simulate social interaction under cognitive-difference-induced communication barriers. Grounded ...

HuggingFace

Autoregressive large language models (LLMs) deliver strong performance but require inherently sequential decoding, leading to high inference latency and poor GPU utilization. Speculative decoding mitigates this bottleneck by using a fast draft model whose outputs are verified in parallel by the target LLM; however, existing methods still rely on autoregressive drafting, which remains sequential and limits practical speedups. Diffusion LLMs offer a promising alternative by enabling parallel gener...

HuggingFace

RL-based post-training with GRPO is widely used to improve large language models on individual reasoning tasks. However, real-world deployment requires reliable performance across diverse tasks. A straightforward multi-task adaptation of GRPO often leads to imbalanced outcomes, with some tasks dominating optimization while others stagnate. Moreover, tasks can vary widely in how frequently prompts yield zero advantages (and thus zero gradients), which further distorts their effective contribution...

HuggingFace

Training-time privileged information (PI) can enable language models to succeed on tasks they would otherwise fail, making it a powerful tool for reinforcement learning in hard, long-horizon settings. However, transferring capabilities learned with PI to policies that must act without it at inference time remains a fundamental challenge. We study this problem in the context of distilling frontier models for multi-turn agentic environments, where closed-source systems typically hide their interna...

HuggingFace

Recent applications of Reinforcement Learning with Verifiable Rewards (RLVR) to Large Language Models (LLMs) and Vision-Language Models (VLMs) have demonstrated significant success in enhancing reasoning capabilities for complex tasks. During RLVR training, an increase in response length is often regarded as a key factor contributing to the growth of reasoning ability. However, the patterns of change in response length vary significantly across different RLVR algorithms during the training proce...

HuggingFace

Humans rarely plan whole-body interactions with objects at the level of explicit whole-body movements. High-level intentions, such as affordance, define the goal, while coordinated balance, contact, and manipulation can emerge naturally from underlying physical and motor priors. Scaling such priors is key to enabling humanoids to compose and generalize loco-manipulation skills across diverse contexts while maintaining physically coherent whole-body coordination. To this end, we introduce InterPr...

HuggingFace

The rapid evolution of large language models (LLMs) has expanded their capabilities from basic dialogue to advanced scientific reasoning. However, existing benchmarks in biology often fail to assess a critical skill required of researchers: the ability to integrate experimental results with contextual knowledge to derive meaningful conclusions. To address this gap, we introduce BABE(Biology Arena BEnchmark), a comprehensive benchmark designed to evaluate the experimental reasoning capabilities o...

HuggingFace

Recent approaches to real-time long video generation typically employ streaming tuning strategies, attempting to train a long-context student using a short-context (memoryless) teacher. In these frameworks, the student performs long rollouts but receives supervision from a teacher limited to short 5-second windows. This structural discrepancy creates a critical student-teacher mismatch: the teacher's inability to access long-term history prevents it from guiding the student on global temporal de...

HuggingFace

Deep research agents have emerged as powerful systems for addressing complex queries. Meanwhile, LLM-based retrievers have demonstrated strong capability in following instructions or reasoning. This raises a critical question: can LLM-based retrievers effectively contribute to deep research agent workflows? To investigate this, we introduce SAGE, a benchmark for scientific literature retrieval comprising 1,200 queries across four scientific domains, with a 200,000 paper retrieval corpus.We evalu...

HuggingFace

Tutorials

Industry News

An article explaining why the author joined OpenAI, a leading AI research company.

RSS

Discussion

Claude is a platform designed to provide a space for thoughtful discussion and exploration.

Anthropic

A new AI model that aims to fill a key gap in current language models by focusing on abstract reasoning.

RSS