This project fine-tuned the Qwen2.5-7B language model on 100 films to generate probabilistic story graphs.
February 8, 2026 Weekly
TL;DR
Model Releases
Tools & Products
Research Papers
Industry News
Model Releases
openbmb/MiniCPM-o-4_5 is a compact pre-trained language model that can be efficiently used for a variety of natural language processing tasks.
OpenClaw is a system built on a cascade of large language models that could potentially cause disruption.
The time to generate a GPT-2 model has decreased to 2.91 hours, indicating advancements in AI model training efficiency.
FutureMa/Eva-4B-V2 is a 4 billion parameter language model with improved performance on various natural language processing benchmarks.
Comfy-Org/ace_step_1.5_ComfyUI_files contains files for a user interface designed to make it easier to work with large language models and other AI tools.
meituan-longcat/LongCat-Image-Edit-Turbo is a tool that allows for efficient and high-quality image editing and manipulation using large language models.
unsloth/Qwen3-Coder-Next-FP8-Dynamic is a new AI-powered coding assistant that uses dynamic programming techniques to help developers write more efficient code.
OpenAI Frontier is an enterprise platform for building, deploying, and managing AI agents with shared context, onboarding, permissions, and governance.
An autonomous lab combining OpenAI’s GPT-5 with Ginkgo Bioworks’ cloud automation cut cell-free protein synthesis costs by 40% through closed-loop experimentation.
GPT‑5.3-Codex is the most capable agentic coding model to date, combining the frontier coding performance of GPT‑5.2-Codex with the reasoning and professional knowledge capabilities of GPT‑5.2.
Tools & Products
Samurai-inspired multi-agent system for Claude Code. Orchestrate parallel AI tasks via tmux with shogun → karo → ashigaru hierarchy.
Supercharge AI coding agents with portable skills. Install, translate & share skills across Claude Code, Cursor, Codex, Copilot & 28 more
Data Science and Machine Learning Bootcamp.
A lightweight, highly extensible ai code agent, built in Rust.
The Ultimate Collection of 700+ Agentic Skills for Claude Code/Antigravity/Cursor. Battle-tested, high-performance skills for AI agents including official skills from Anthropic and Vercel.
AI Tool Use API for Anima anime/illustration image generation. Supports MCP Server, HTTP API, and CLI.
Smart LLM router — save 78% on inference costs. 30+ models, one wallet, x402 micropayments.
Excalidraw MCP App Server — hand-drawn diagrams for Claude
MCP Server and CLI for accessing Work IQ
The High-Performance Python Web Framework. The simplicity of Streamlit, minus the reruns
Mega Scale Multimodal DataPipeline for SOTA models
Coding agents have replaced many software frameworks and tools used by developers.
Research Papers
The Waymo World Model is a large-scale map and simulation environment for training self-driving car AI.
Research on reinforcement learning from human feedback to train AI systems.
Researchers used agent teams to build a C compiler using the Opus 4.6 system.
Hypernetworks are a new neural network architecture that can model hierarchical data structures more effectively.
Evaluating and mitigating the growing risk of zero-day vulnerabilities discovered by large language models (LLMs).
Large language models (LLMs) are increasingly evaluated in interactive environments to test their social intelligence. However, existing benchmarks often assume idealized communication between agents, limiting our ability to diagnose whether LLMs can maintain and repair interactions in more realistic, imperfect settings. To close this gap, we present SocialVeil, a social learning environment that can simulate social interaction under cognitive-difference-induced communication barriers. Grounded ...
Autoregressive large language models (LLMs) deliver strong performance but require inherently sequential decoding, leading to high inference latency and poor GPU utilization. Speculative decoding mitigates this bottleneck by using a fast draft model whose outputs are verified in parallel by the target LLM; however, existing methods still rely on autoregressive drafting, which remains sequential and limits practical speedups. Diffusion LLMs offer a promising alternative by enabling parallel gener...
RL-based post-training with GRPO is widely used to improve large language models on individual reasoning tasks. However, real-world deployment requires reliable performance across diverse tasks. A straightforward multi-task adaptation of GRPO often leads to imbalanced outcomes, with some tasks dominating optimization while others stagnate. Moreover, tasks can vary widely in how frequently prompts yield zero advantages (and thus zero gradients), which further distorts their effective contribution...
Training-time privileged information (PI) can enable language models to succeed on tasks they would otherwise fail, making it a powerful tool for reinforcement learning in hard, long-horizon settings. However, transferring capabilities learned with PI to policies that must act without it at inference time remains a fundamental challenge. We study this problem in the context of distilling frontier models for multi-turn agentic environments, where closed-source systems typically hide their interna...
Recent applications of Reinforcement Learning with Verifiable Rewards (RLVR) to Large Language Models (LLMs) and Vision-Language Models (VLMs) have demonstrated significant success in enhancing reasoning capabilities for complex tasks. During RLVR training, an increase in response length is often regarded as a key factor contributing to the growth of reasoning ability. However, the patterns of change in response length vary significantly across different RLVR algorithms during the training proce...
Humans rarely plan whole-body interactions with objects at the level of explicit whole-body movements. High-level intentions, such as affordance, define the goal, while coordinated balance, contact, and manipulation can emerge naturally from underlying physical and motor priors. Scaling such priors is key to enabling humanoids to compose and generalize loco-manipulation skills across diverse contexts while maintaining physically coherent whole-body coordination. To this end, we introduce InterPr...
The rapid evolution of large language models (LLMs) has expanded their capabilities from basic dialogue to advanced scientific reasoning. However, existing benchmarks in biology often fail to assess a critical skill required of researchers: the ability to integrate experimental results with contextual knowledge to derive meaningful conclusions. To address this gap, we introduce BABE(Biology Arena BEnchmark), a comprehensive benchmark designed to evaluate the experimental reasoning capabilities o...
Recent approaches to real-time long video generation typically employ streaming tuning strategies, attempting to train a long-context student using a short-context (memoryless) teacher. In these frameworks, the student performs long rollouts but receives supervision from a teacher limited to short 5-second windows. This structural discrepancy creates a critical student-teacher mismatch: the teacher's inability to access long-term history prevents it from guiding the student on global temporal de...
Deep research agents have emerged as powerful systems for addressing complex queries. Meanwhile, LLM-based retrievers have demonstrated strong capability in following instructions or reasoning. This raises a critical question: can LLM-based retrievers effectively contribute to deep research agent workflows? To investigate this, we introduce SAGE, a benchmark for scientific literature retrieval comprising 1,200 queries across four scientific domains, with a 200,000 paper retrieval corpus.We evalu...
Tutorials
An article discussing how to write quality code with the help of AI tools and techniques.
Industry News
A Waymo executive admits that remote operators in the Philippines help guide the company's self-driving cars in the US.
An article explaining why the author joined OpenAI, a leading AI research company.
Indian female workers are watching hours of abusive content to train AI systems, exposing them to harmful material.
The continued plunge in Amazon's stock price is fueling fears of an AI technology bubble burst.
The growth of AI is causing shortages of computing power, chips, and other resources needed for broader technology development.
The FBI was unable to access the iPhone of a Washington Post reporter due to the Lockdown Mode feature.
A breakdown of the supply chain attack on Notepad++, a popular text editor.
The European Commission is testing a new platform called Matrix to potentially replace Microsoft Teams for communications and collaboration.
OpenAI shares its approach to AI localization, showing how globally shared frontier models can be adapted to local languages, laws, and cultures without compromising safety.
Discussion
Claude is a platform designed to provide a space for thoughtful discussion and exploration.
AI didn't break copyright law, it just exposed how broken the current copyright system is.
Users are advised to stop using OpenClaw, formerly known as Moltbot.
A critique suggests that current coding assistants are solving the wrong problem, and that the focus should shift to improving overall software development workflows.
Large language models (LLMs) have the potential to be used as compilers, but experts caution that this may not be the best application for these models.
The article argues that C is no longer a true programming language in 2022 due to its many limitations compared to more modern languages.
A new article explores the idea of making writing tests a joyful experience for developers.
A man who videotaped himself BASE jumping in Yosemite was arrested, claiming the jump was performed by an AI system.
Locating data centers in space is considered impractical and ineffective.
An article describing the author's AI adoption journey and their experiences.
The Wyden Siren, a pattern in Senator Wyden's cryptic letters to the CIA, has a perfect track record of predicting future events.
Product and design are now the main bottlenecks for many companies as technological advances have made engineering less of a constraint.
A new AI model that aims to fill a key gap in current language models by focusing on abstract reasoning.
A family shares how ChatGPT helped them prepare for critical cancer treatment decisions for their son alongside expert guidance from his doctors.