TL;DR
Research Papers
Industry News
Model Releases
openbmb/MiniCPM-o-4_5 is a compact pre-trained language model that can be efficiently used for a variety of natural language processing tasks.
OpenAI Frontier is an enterprise platform for building, deploying, and managing AI agents with shared context, onboarding, permissions, and governance.
An autonomous lab combining OpenAI’s GPT-5 with Ginkgo Bioworks’ cloud automation cut cell-free protein synthesis costs by 40% through closed-loop experimentation.
GPT‑5.3-Codex is the most capable agentic coding model to date, combining the frontier coding performance of GPT‑5.2-Codex with the reasoning and professional knowledge capabilities of GPT‑5.2.
OpenClaw is a system built on a cascade of large language models that could potentially cause disruption.
The time to generate a GPT-2 model has decreased to 2.91 hours, indicating advancements in AI model training efficiency.
FutureMa/Eva-4B-V2 is a 4 billion parameter language model with improved performance on various natural language processing benchmarks.
Comfy-Org/ace_step_1.5_ComfyUI_files contains files for a user interface designed to make it easier to work with large language models and other AI tools.
meituan-longcat/LongCat-Image-Edit-Turbo is a tool that allows for efficient and high-quality image editing and manipulation using large language models.
unsloth/Qwen3-Coder-Next-FP8-Dynamic is a new AI-powered coding assistant that uses dynamic programming techniques to help developers write more efficient code.
Tools & Products
The Ultimate Collection of 700+ Agentic Skills for Claude Code/Antigravity/Cursor. Battle-tested, high-performance skills for AI agents including official skills from Anthropic and Vercel.
Orchestration layer for coding agents (Claude Codes)
Enable Claude Code to learn in real-time, update it's knowledge, and grow with you, using supermemory.
Smart LLM router — save 78% on inference costs. 30+ models, one wallet, x402 micropayments.
Excalidraw MCP App Server — hand-drawn diagrams for Claude
MCP Server and CLI for accessing Work IQ
Skill and MCP server for searching and retrieving icons
The High-Performance Python Web Framework. The simplicity of Streamlit, minus the reruns
Coding agents have replaced many software frameworks and tools used by developers.
Quash is an intent-driven mobile testing tool that lets you write and run tests in plain language instead of scripts. You can run tests on real devices, cloud devices or local emulators. Quash adapts when the UI changes using built-in self healing, understands app behavior across builds, supports backend validations, reusable test data, test suites and running tests in parallel. Every run generates detailed execution reports with step level intent, actions and screenshots.
Nativeline is the first AI platform that builds native apps for iPhone, iPad, and Mac, all in one place. Other tools stop at iPhone. Most output web wrappers. Nativeline builds real native Swift for every Apple platform. Mac apps with menus and multiple windows. iPad apps that use the full screen. iPhone apps that feel like they belong. Choose your platform. Describe your idea. Ship to the App Store. The Apple ecosystem. Unlocked.
We built a no-code AI lab where you can train your own AI models with your own data. NeuroBlock OS offers an integrated ecosystem: generate and access datasets, train and deploy models, and download them to run anywhere, on your computer, server, smartphone, or through our NeuroAI cloud inference framework, ready to integrate into workflows. AI you own, cheap to run, and built to perform exactly the way you want.
Upload your messy deck. AI redesigns every page with premium layouts, smooth animations, and brand-matched styling. Export fully editable PPTX, Google Slides, or Keynote. Keep editing forever.
PinMe helps you publish sites in seconds. You can upload sites from your browser with drag and drop, or deploy from your terminal with a single command. Deploy, get a link, and share. PinMe focused on a fast, clean deployment experience without locking you into an all in one platform. No accounts, no sign ups, no logins, no payments required.
Research Papers
Research on reinforcement learning from human feedback to train AI systems.
The Waymo World Model is a large-scale map and simulation environment for training self-driving car AI.
Researchers used agent teams to build a C compiler using the Opus 4.6 system.
Hypernetworks are a new neural network architecture that can model hierarchical data structures more effectively.
Evaluating and mitigating the growing risk of zero-day vulnerabilities discovered by large language models (LLMs).
Deep research agents have emerged as powerful systems for addressing complex queries. Meanwhile, LLM-based retrievers have demonstrated strong capability in following instructions or reasoning. This raises a critical question: can LLM-based retrievers effectively contribute to deep research agent workflows? To investigate this, we introduce SAGE, a benchmark for scientific literature retrieval comprising 1,200 queries across four scientific domains, with a 200,000 paper retrieval corpus.We evalu...
RL-based post-training with GRPO is widely used to improve large language models on individual reasoning tasks. However, real-world deployment requires reliable performance across diverse tasks. A straightforward multi-task adaptation of GRPO often leads to imbalanced outcomes, with some tasks dominating optimization while others stagnate. Moreover, tasks can vary widely in how frequently prompts yield zero advantages (and thus zero gradients), which further distorts their effective contribution...
Autoregressive large language models (LLMs) deliver strong performance but require inherently sequential decoding, leading to high inference latency and poor GPU utilization. Speculative decoding mitigates this bottleneck by using a fast draft model whose outputs are verified in parallel by the target LLM; however, existing methods still rely on autoregressive drafting, which remains sequential and limits practical speedups. Diffusion LLMs offer a promising alternative by enabling parallel gener...
Humans rarely plan whole-body interactions with objects at the level of explicit whole-body movements. High-level intentions, such as affordance, define the goal, while coordinated balance, contact, and manipulation can emerge naturally from underlying physical and motor priors. Scaling such priors is key to enabling humanoids to compose and generalize loco-manipulation skills across diverse contexts while maintaining physically coherent whole-body coordination. To this end, we introduce InterPr...
Recent approaches to real-time long video generation typically employ streaming tuning strategies, attempting to train a long-context student using a short-context (memoryless) teacher. In these frameworks, the student performs long rollouts but receives supervision from a teacher limited to short 5-second windows. This structural discrepancy creates a critical student-teacher mismatch: the teacher's inability to access long-term history prevents it from guiding the student on global temporal de...
The rapid evolution of large language models (LLMs) has expanded their capabilities from basic dialogue to advanced scientific reasoning. However, existing benchmarks in biology often fail to assess a critical skill required of researchers: the ability to integrate experimental results with contextual knowledge to derive meaningful conclusions. To address this gap, we introduce BABE(Biology Arena BEnchmark), a comprehensive benchmark designed to evaluate the experimental reasoning capabilities o...
Training-time privileged information (PI) can enable language models to succeed on tasks they would otherwise fail, making it a powerful tool for reinforcement learning in hard, long-horizon settings. However, transferring capabilities learned with PI to policies that must act without it at inference time remains a fundamental challenge. We study this problem in the context of distilling frontier models for multi-turn agentic environments, where closed-source systems typically hide their interna...
Large language models (LLMs) are increasingly evaluated in interactive environments to test their social intelligence. However, existing benchmarks often assume idealized communication between agents, limiting our ability to diagnose whether LLMs can maintain and repair interactions in more realistic, imperfect settings. To close this gap, we present SocialVeil, a social learning environment that can simulate social interaction under cognitive-difference-induced communication barriers. Grounded ...
Recent applications of Reinforcement Learning with Verifiable Rewards (RLVR) to Large Language Models (LLMs) and Vision-Language Models (VLMs) have demonstrated significant success in enhancing reasoning capabilities for complex tasks. During RLVR training, an increase in response length is often regarded as a key factor contributing to the growth of reasoning ability. However, the patterns of change in response length vary significantly across different RLVR algorithms during the training proce...
Tutorials
An article discussing how to write quality code with the help of AI tools and techniques.
Industry News
A Waymo executive admits that remote operators in the Philippines help guide the company's self-driving cars in the US.
An article explaining why the author joined OpenAI, a leading AI research company.
The growth of AI is causing shortages of computing power, chips, and other resources needed for broader technology development.
Indian female workers are watching hours of abusive content to train AI systems, exposing them to harmful material.
The continued plunge in Amazon's stock price is fueling fears of an AI technology bubble burst.
The FBI was unable to access the iPhone of a Washington Post reporter due to the Lockdown Mode feature.
A breakdown of the supply chain attack on Notepad++, a popular text editor.
OpenAI shares its approach to AI localization, showing how globally shared frontier models can be adapted to local languages, laws, and cultures without compromising safety.
Google Cloud built an industry-first AI tool to help U.S. Ski and Snowboard athletes.
A balanced perspective on the potential and challenges of Clawdbot and OpenClaw AI models.
Anthropic, an AI research company, has experienced a system outage impacting their operations.
The release of an AI tool by Anthropic has led to a sell-off in the software and broader market.
Discussion
Claude is a platform designed to provide a space for thoughtful discussion and exploration.
AI didn't break copyright law, it just exposed how broken the current copyright system is.
Users are advised to stop using OpenClaw, formerly known as Moltbot.
A critique suggests that current coding assistants are solving the wrong problem, and that the focus should shift to improving overall software development workflows.
A man who videotaped himself BASE jumping in Yosemite was arrested, claiming the jump was performed by an AI system.
An article describing the author's AI adoption journey and their experiences.
Large language models (LLMs) have the potential to be used as compilers, but experts caution that this may not be the best application for these models.
A family shares how ChatGPT helped them prepare for critical cancer treatment decisions for their son alongside expert guidance from his doctors.