Starship V3 represents the latest iteration of SpaceX's fully reusable spacecraft design aimed at enabling rapid, low-cost space transportation. The advancement continues development toward achieving reliable orbital refueling and deep space missions.
TL;DR
Model Releases
Tools & Products
Research Papers
Model Releases
Tools & Products
Production-grade MCP server giving Claude 27 security intelligence tools across 21 APIs β CVE lookup, EPSS scoring, CISA KEV, MITRE ATT&CK, Shodan, VirusTotal, and more.
A TDD-driven iterative feedback loop for software development. 16 cohesive Claude Code skills walk an idea from brainstorm β plan β execute β iterate, with checkpoints throughout.
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents
π The open-source Palantir Foundry alternative. Connect any data source, build ontologies, create pipelines, visualize with dashboards, and make AI-powered decisions. Self-hosted. Built with Rust + Svelte.
Weβre opening 50 free spots for our Founding Member Program for founders, SMB owners who want to try Memoket Gem early. Memoket Gem is an all-day AI wearable that captures meetings, calls, coffee chats, and decisions on the go. It summarizes key moments, connects context across conversations, and turns them into tasks, notes, and follow-ups in the tools you already use. Join us and help shape the future of real-world AI memory.
Ask questions across your Markdown notes using a fully local Graph RAG engine. Built for Obsidian vaults, works with any folder of Markdown files. Extracts entity-relation triples from wikilinks & YAML frontmatter, retrieves answers via hybrid search (vector + BM25 + temporal). Multilingual. No cloud. Runs on Ollama.
Hyper efficient storage for GPU workloads. Feed your GPUs at blazing fast speeds.
Local-first desktop activity tracker β see where your hours go, with on-device AI daily summaries and optional multi-device sync
Living UI is a brand-new system that lets CraftBot (general AI agent) build, import, or evolve custom apps/dashboards that live inside CraftBot itself. The agent stays context-aware of the Living UI's state and can read, write, and act on its Living UI directly. A Living UI is never "finished". Ask CraftBot to add features or redesign a view as your needs grow. Living UI turns software from something users buy and adapt to into something CraftBot creates and adapts around them.
Enter your website and get all the AI agents you need to grow your business. One AI that calls, texts, and emails all of your customers 24/7. A CRM, a ticketing system, even a website builder.
A new category of laptops built from the ground up for Gemini intelligence. These devices feature the Magic Pointer for contextual suggestions and custom widgets to help you organize your tasks. Keep an eye on googlebook.com for more updates before the devices launch this fall.
Blaze 2.0 is the marketing solution for people who don't have time to do marketing. It learns your business, your audience, and your voice β then creates and manages your entire content strategy, automatically. Like having a full-time marketer on your team without the salary.
Liminary turns everything youβve saved into working memory for AI. Unlike chatbots, meeting tools, or project-based notebooks, it gives your knowledge one shared memory across writing, meetings, and research. It surfaces relevant context automatically as you work, helping expert knowledge workers reuse their best thinking, avoid starting from scratch, and produce source-grounded work with traceable citations.
Pipali is an AI coworker that lives on your computer. It interacts with your files, browser and apps to get real work done. Pipali can handle most computer work β deep research, polished docs, browser tasks and routine errands. Teach it your workflows with Skills, run recurring tasks with Routines and integrate with your apps like Linear, Slack and GitHub via MCP.
OR-Tools CP-SAT is a constraint programming solver that can be effectively used to tackle complex scheduling problems in operations research. It provides a powerful optimization tool for solving real-world scheduling challenges efficiently.
Research Papers
Beyond Semantic Similarity explores advanced methods for understanding and comparing meaning in text that go beyond traditional similarity metrics. It addresses the limitations of conventional semantic analysis approaches in capturing nuanced relationships between concepts.
When adapting an encoder to a new domain, the standard approach is to continue training with Masked Language Modeling (MLM). We show that temporarily switching to Causal Language Modeling (CLM) followed by a short MLM decay improves downstream performance. On biomedical texts with ModernBERT, this CLM detour outperforms MLM baselines trained on identical data and compute across 8 French and 11 English biomedical tasks, by +1.2-2.8pp and +0.3-0.8pp respectively, depending on model size. We invest...
The continued improvements in language model capability have unlocked their widespread use as drivers of autonomous agents, for example in coding or computer use applications. However, the core of these systems has not changed much since early instruction-tuned models like ChatGPT. Even advanced AI agents function on message exchange formats, successively exchanging messages with users, systems, with itself (i.e. chain-of-thought) and tools in a single stream of computation. This bottleneck to a...
Long-term memory is crucial for agents in specialized web environments, where success depends on recalling interface affordances, state dynamics, workflows, and recurring failure modes. However, existing memory benchmarks for agents mostly focus on user histories, short traces, or downstream task success, leaving open how to directly evaluate whether memory systems effectively internalize environment-specific experience. To address this gap, we introduce LongMemEval-V2 (LME-V2), a benchmark for ...
Pixel diffusion models have recently regained attention for visual generation. However, training advanced pixel-space models from scratch demands prohibitive computational and data resources. To address this, we propose the Latent-to-Pixel (L2P) transfer paradigm, an efficient framework that directly harnesses the rich knowledge of pre-trained LDMs to build powerful pixel-space models. Specifically, L2P discards the VAE in favor of large-patch tokenization and freezes the source LDM's intermedia...
LLM-based agents increasingly operate in persistent environments where they must store, update, and reason over information across many sessions. While prior benchmarks evaluate only single-entity updates, MEME defines six tasks spanning the full space defined by the multi-entity and evolving axes, including three not scored by prior work: Cascade and Absence (dependency reasoning) and Deletion (post-removal state). Evaluating six memory systems spanning three memory paradigms on 100 controlled ...
Large language models (LLMs) are increasingly deployed on long-horizon tasks in partially observable environments, where they must act while inferring and tracking a complex environment state over many steps. This leads to two challenges: partial observability requires maintaining uncertainty over unobserved world attributes, and long interaction history causes context to grow without bound, diluting task-relevant information. A principled solution to both challenges is a belief state: a posteri...
We introduce Pion, a spectrum-preserving optimizer for large language model (LLM) training based on orthogonal equivalence transformation. Unlike additive optimizers such as Adam and Muon, Pion updates each weight matrix through left and right orthogonal transformations, preserving its singular values throughout training. This yields an optimization mechanism that modulates the geometry of weight matrices while keeping their spectral norm fixed. We derive the Pion update rule, systematically exa...
In this paper, we propose AlphaGRPO, a novel framework that applies Group Relative Policy Optimization (GRPO) to AR-Diffusion Unified Multimodal Models (UMMs) to enhance multimodal generation capabilities without an additional cold-start stage. Our approach unlocks the model's intrinsic potential to perform advanced reasoning tasks: Reasoning Text-to-Image Generation, where the model actively infers implicit user intents, and Self-Reflective Refinement, where it autonomously diagnoses and correc...
Vision-Language-Action (VLA) models have achieved strong semantic generalization for embodied policy learning, yet they learn reactive observation-to-action mappings without explicitly modeling how the physical world evolves under intervention. A growing body of work addresses this limitation by integrating world models, predictive models of environment dynamics, into the action generation pipeline. We term this emerging paradigm World Action Models (WAMs): embodied foundation models that unify ...
Model-based representations recently stand out as a promising framework that embeds latent dynamics information into the representations for downstream off-policy actor-critic learning. It implicitly combines the advantages of both model-free and model-based approaches while avoiding the training costs associated with model-based methods. Nevertheless, existing model-based representation methods can fail to capture sufficient information about relevant variables and can overfit to early experien...
Hidden malicious intent in multi-turn dialogue poses a growing threat to deployed large language models (LLMs). Rather than exposing a harmful objective in a single prompt, increasingly capable attackers can distribute their intent across multiple benign-looking turns. Recent studies show that even modern commercial models with advanced guardrails remain vulnerable to such attacks despite advances in safety alignment and external guardrails. In this work, we address this challenge by detecting t...
Tool-using LLM agents fail through trajectories rather than only final responses, as they may execute unsafe tool calls, follow injected instructions, comply with harmful requests, or over-refuse benign tasks despite producing a seemingly safe answer. Existing safety-alignment signals are largely response-level or off-policy, and often incur a safety-utility trade-off: improving agent safety comes at the cost of degraded task performance. Such sparse and single-objective rewards severely limit r...
Transformer-based 3D reconstruction has emerged as a powerful paradigm for recovering geometry and appearance from multi-view observations, offering strong performance across challenging visual conditions. As these models scale to larger backbones and higher-resolution inputs, improving their efficiency becomes increasingly important for practical deployment. However, modern 3D transformer pipelines face two coupled challenges: dense multi-view attention creates substantial token-mixing overhead...
Gaussian Splatting has achieved remarkable progress in multi-view surface reconstruction, yet it exhibits notable degradation when only few views are available. Although recent efforts alleviate this issue by enhancing multi-view consistency to produce plausible surfaces, they struggle to infer unseen, occluded, or weakly constrained regions beyond the input coverage. To address this limitation, we present VidSplat, a training-free generative reconstruction framework that leverages powerful vide...
Industry News
A utility company is considering redirecting power lines away from Tahoe residents to supply data centers, potentially leaving 50,000 residents without adequate power. This decision highlights the growing conflict between energy demands from AI and tech infrastructure versus residential communities' needs.
The United States maintains a competitive edge in artificial intelligence by achieving greater commercial success and market adoption compared to other nations. This advantage is demonstrated through the widespread deployment of AI technologies across various industries and the growth of AI-driven businesses.