Discover new Codex plugins, sites, and annotations that help analysts, marketers, designers, investors, and other teams get more done with AI.
TL;DR
Model Releases
Tools & Products
Research Papers
Industry News
Discussion
Model Releases
Tools & Products
β¨ The agentic HTML editor β your local AI agent writes the HTML, you ship it. π 75 Skills Γ 9 Surfaces (magazine Β· deck Β· poster Β· XHS / tweet Β· prototype Β· data report Β· Hyperframes) π‘οΈ Sandboxed preview Β· π€ 1-click to WeChat / X / Zhihu / HTML / PNG π Zero API key β Claude Code / Cursor / Codex / Gemini / Copilot / OpenCode / Qwen / Aider.
Bot that bridges Feishu/Lark messenger with a local Claude Code CLI β streaming cards, per-chat sessions, multiple workspaces
AI-powered modular Active Directory red-team framework for authorized penetration testing, AD enumeration, attack-path analysis, Kerberos/ADCS workflows, reporting, operator automation, and MCP server integration.
Code Modern. Code Legacy. Code Firmware. - open-source AI-native IDE with agentic coding, Power Mode, legacy modernization, and firmware development
Fundraisly: ultimate AI agent for fundraising. It analyzes 300K+ investors and millions of deals, identifies the relevant ones actively investing in your space, maps warm paths to them from your own network, then covers the rest with targeted cold outreach. The result: 20-40 qualified investor meetings. Built by founders who raised over $1B.
OpenAI's frontier models and Codex are now accessible through Amazon Web Services (AWS), expanding availability of these powerful AI tools to a broader range of enterprises and developers.
GBase β Recursive Self-Improvement Agent Framework. Memory, evolution, quality gates, identity system, and 40+ auto-registered tools.
A local control plane for AI agents β see what they do, approve what matters, keep secrets out. Rust + Tauri + Chrome MV3.
Your Codex and my Codex canβt talk, so we play human telephone in Slack: copy prompts, paste summaries, ask for reviews, and lose the run. Vokal brings 10x teammates and their agents into one live workspace in minutes, whether they run local Codex, Claude Code, or Hermes β or in the cloud. Name your agents, give them roles, access, and memory, and work will happen in a shared collaboration space instead of through copy-paste handoffs.
Gigacatalyst.com's AI builder learns your APIs and embeds in your product, so your sales and CS teams can build missing features that customers need to your platform. When your software adapts to every customer's workflow, they utilize your software more, retain for longer, and expand quicker, because they get most custom implementation for their exact usecase.
AI agents can ship quickly, but without the right product context, they're often flying blind. Brief gives product teams a living source of truth that captures decisions, preserves product intent, and serves relevant context to humans and agents through chat, Slack, CLI, and MCP. It keeps strategy, decisions, and execution connected from vision to impact.
Privacy-first, lightning fast, searchable, and avaialbe across all devices. Save snippets, sync securely, and boost productivity with smart shortcuts and instant paste history.
The Mac app started as a dumb question: can you use font ligatures to turn AI into π©? Turns out yes. Ironically, I used AI to figure out how. The Chrome extension came after β web fonts don't always cooperate. So mostly this is me poking fun. It's also a small nudge to be a little more mindful of the din around AI. Starting, apparently, with your font files.
Rodeo by TwelveLabs is the AI video intelligence platform for creators and teams who produce at scale. Stop wasting hours scrubbing footage. Go from raw clips to a first cut in minutes using plain language. It's structured creation, not manual review. Unlike transcript-first tools, Rodeo's multimodal AI understands visuals, audio, speech, and text simultaneously, making it perfect for visual-first content. Your video library is now instantly queryable for humans and agents.
Branda turns a name and idea into a complete brand identity in minutes: strategy, logo, palette, type, and full brand kit. Start from scratch or import existing assets. β¨ 200 free credits on signup β¨ Let AI lead with Lucky, or guide strategy, sketches, and vector concepts yourself. Use visual prompts to keep every generation consistent, extract elements, vectorize to SVG, upscale, export, and share a public showcase.
Research Papers
Affordance understanding bridges visual perception and physical action, serving as an explainable interface for robot manipulation in open and unstructured real-world environments. Yet, building an affordance foundation model that not only understands where and how the interaction should happen, but also generalizes across diverse environments, objects, and tasks, remains a long-standing research challenge. Existing methods typically address only part of this challenge, either localizing task-re...
Building capable visual web agents requires long-horizon reasoning, precise grounding, and robust interaction with dynamic real-world websites. Despite rapid progress, the strongest systems remain largely proprietary, while open agents still depend heavily on supervised post-training over large collections of curated web trajectories. This dependence creates a major scalability bottleneck: high-quality demonstrations are expensive to collect, and static datasets offer limited coverage of the div...
Large language models solve complex problems by generating lengthy chains of explicit reasoning tokens. While effective, this makes reasoning expensive, length-sensitive, and constrained to (discrete) natural language. While latent reasoning offers a continuous alternative, determining useful structures for intermediate latent states is an open challenge. In this paper, we formulate latent reasoning as a geometric path-approximation problem within the model's pretrained token-embedding space. We...
Learning a shared representation between spoken text and gesture is central to co-speech gesture retrieval, synthesis, and understanding, but remains challenging for semantically meaningful gestures whose communicative intent is not captured by motion alone. Direct contrastive alignment between transcripts and continuous motion embeddings often overemphasizes low-level kinematics and misses the symbolic content of semantic gestures. We propose semantic motion anchors, natural-language abstractio...
In open-ended environments, exploration is fundamental for autonomous agents, yet current language model agents struggle with this. Effective exploration requires memory, but retaining raw interaction histories is computationally expensive over long trajectories. While latent memory offers a solution to compress interaction histories, its training lacks reliable supervisory signals. We introduce Joint Agent Memory and Exploration Learning (JAMEL), a framework that trains agentic memory and explo...
Inverse graphics is a longstanding and highly underconstrained problem that seeks to reconstruct images as editable 3D scenes which can be rendered, relit, and manipulated. In this work, we investigate whether pretrained vision-language models (VLMs) can perform executable inverse graphics directly from a single image by reconstructing a scene as an editable Blender program, without relying on specialized 2D or 3D foundation models, differentiable rendering, or multi-view supervision. We introdu...
The Model Context Protocol (MCP) has emerged as a transformative standard for connecting large language models (LLMs) with external data sources and tools, and has been rapidly adopted across personal applications and development platforms. However, existing benchmarks predominantly focus on generic information-seeking tools and fail to capture the practical challenges posed by personal social applications, where tools interact with individual accounts or local databases. To bridge this critical...
Selecting the best response from multiple small-model samples using a stronger scorer is a simple inference-time strategy, but fails when the small model has already committed to incorrect reasoning paths. PRM guided search avoids this by scoring candidate continuations during generation, but requires a reward model trained with step-level labels. We propose Chunk-Level Guided Generation, a training-free alternative that uses an off-the-shelf large language model as a process scorer. At each s...
Search agents are often trained as policies over growing transcripts: the model must decide how to search while also remembering what it has seen, which evidence is useful, which constraints remain open, and which claims have actually been checked. We argue that this formulation puts too much routine state management inside the policy: reinforcement learning is forced to optimize both semantic search decisions and recoverable bookkeeping that the environment can maintain more reliably. We introd...
Deep Research Agents have shown strong capability in multi-step information retrieval, reasoning, and long-form report generation, but existing benchmarks and systems remain predominantly text-centric, with limited evaluation of whether visual elements are factually reliable and well aligned with the surrounding analysis. To address this gap, we introduce TVIR (Text--Visual Interleaved Report Generation), which includes TVIR-Bench, a benchmark of 100 expert-curated multimodal deep research tasks...
Computer use agents (CUAs) today are primarily deployed as single serial agents. This setup is suboptimal for complex long-horizon tasks that benefit from task decomposition, parallel execution, and consistent re-planning based on new information. In this paper, we argue that we should instead move towards evaluating and building multi-agent computer use (MACU) systems. These systems, which emphasize planning and parallel execution, alleviate many of the shortcomings of single-agent CUAs. We pro...
Reinforcement learning (RL) improves large language model (LLM) agents by teaching them which actions lead to high rewards, but provides little supervision on what those actions do to the environment. World modeling (WM) can fill this gap, yet existing approaches often require separate simulators, extra training stages, or additional inference-time computation. We observe that on-policy RL rollouts already contain the needed signal: each transition pairs an action with its resulting next observa...
Autoregressive (AR) video diffusion enables variable-length synthesis, but long-horizon generation often suffers from accumulated errors and identity drift. For efficiency, existing methods commonly adopt sliding-window attention during generation. This creates an irreversible generation trajectory: once the active window accumulates appearance errors, subsequent generations can only condition on this degraded trajectory and drift further away. We address this limitation by formulating long vide...
Human annotation is the empirical foundation of much NLP research, from dataset construction to model evaluation, but papers often leave unclear who produced the annotations and how the annotation process was controlled. We provide the first large-scale, task-level audit of human annotation reporting across major NLP venues, asking which annotation details are documented, which are missing, and how reporting varies across time, topic, venue, and intended use of human judgment. We introduce a uni...
The recent "Reasoning with Video" paradigm utilizes Video Generation Models (VGMs) to generate temporally coherent visual trajectories to complete reasoning tasks. Although state-of-the-art VGMs excel at visual quality, they often struggle to understand and follow task-specific rules, leading to logical failures across diverse reasoning scenarios. Existing efforts try to utilize Vision-Language Models (VLMs) as problem pre-solvers to produce or refine textual guidance for the VGM. However, textu...
Industry News
Alphabet announces an $80 billion equity capital raise specifically dedicated to expanding its AI infrastructure and computational capacity to support advanced AI development.
President Trump signs a streamlined AI executive order following weeks of policy deliberation and modifications, indicating the administration's effort to establish regulatory framework for artificial intelligence development.
The article examines whether the stock market has sufficient capacity and valuation to accommodate major AI companies like Anthropic, SpaceX, and OpenAI as they continue their rapid growth.
Groq, an AI accelerator company, is successfully securing additional funding to support its continued growth and development of specialized hardware for AI inference and computation.
Investor Michael Burry disputes the $1 trillion valuations attributed to SpaceX and Anthropic, suggesting these companies are significantly overvalued in the current market.
Communities and policymakers, lacking clear strategies for regulating AI itself, are instead focusing opposition on data centers and their environmental and community impacts as a proxy for AI concerns.
Travelers built an AI-powered Claim Assistant with OpenAI to guide customers through filing claims, provide 24/7 support, and scale operations during peak demand.
Discussion
This appears to be a documentary or media piece titled 'Why Janet?' from 2023, likely exploring the story or significance of someone named Janet, though the specific context would require viewing the actual content.