Cainew - Curated AI news for developers

TL;DR

Model Releases

Tools & Products

Research Papers

Bag of Dims: Training-Free Mechanistic Interpretability via Dimension-Level Sign Patterns
Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games
Morpheus: A Morphology-Aware Neural Tokenizer and Word Embedder for Turkish

Industry News

Jun 18, 2026Frontier Red TeamProject Fetch: Phase two

Discussion

Model Releases

DeepSeek Introduces Vision

DeepSeek has introduced vision capabilities to its AI models, enabling them to process and analyze images alongside text. This multimodal expansion allows DeepSeek models to perform tasks requiring visual understanding.

RSS

Improving health intelligence in ChatGPT

Learn how GPT-5.5 Instant improves ChatGPT’s health and wellness responses with stronger reasoning, better context, clearer communication, and physician-informed evaluations.

OpenAI

Tools & Products

superloglabs/superlog

Open-source observability tool that uses AI agents to self-heal your software

GitHub

fivetaku/fablize

A Claude Code plugin that makes Opus behave like Fable — completion, evidence, and verification enforced as procedure. Ships only what a Fable-vs-Opus comparison proved transferable.

GitHub

agentic-in/inferoa

Inference-native Tokenmaxxing Agent Harness for Loop Engineering

GitHub

Shiyao-Huang/awesome-agent-evolution

Open survey and evidence map for AI agent evolution, self-evolving agents, memory, skills, harnesses, benchmarks, and agent-swarm systems.

GitHub

Upstream: The inbox designed for humans and agents

Finally, an inbox you'll look forward to. Agents sort your messages, draft your replies, and clear the grunt work behind the scenes, all in a client so well-crafted that email feels light, fast, fun.

ProductHunt

NazzarenoGiannelli/tuiboard

Terminal kanban on plain markdown, with optional Today/Tomorrow planner, 24h agenda + calendar, and a live Claude Code agent view. Use only the panels you want.

GitHub

RightNow-AI/AutoMegaKernel

An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode, paper: https://arxiv.org/abs/2606.09682

GitHub

Jesse: Stop building Apollo/Clay lists. Search the live internet.

Sales teams have been stuck with stale databases for 15 years. Jesse changes everything.. the first internet-wide search engine built for sales & marketing. Ask in plain English: "Find newly opened soccer facilities in the Midwest needing turf solutions." Jesse scans the live web and finds the right buyers in the market today. We are an anti database company, we don’t scrape and store stale databases and sell them at premium. Every lead is found fresh from the live internet and delivered.

ProductHunt

Elvin: Proactive AI that finds and finishes work before you ask

Stop acting as the routing layer between AI and your actual work. Elvin is proactive AI that finds coordination work across your tools, handles the messy multi-step parts, and asks before taking action. It turns scattered context from messages, meetings, docs, and task tools into ready-to-approve drafts, follow-ups, updates, and next steps.

ProductHunt

Millijethwani18/fable-opus-desktop-kit

Claude Fable 5 Desktop 2026 – Ultimate AI Storytelling & Code Assistant Tool

GitHub

Viktor for Microsoft Teams: The most powerful AI employee, now in Microsoft Teams

An autonomous AI employee that lives in Microsoft Teams and does real work across 3,000+ tools: reports, reconciliations, approvals, recurring ops. Not a copilot that drafts and waits. It ships. Live today, $100 in credits, no card.

ProductHunt

Agentic videos by D-ID: Interactive videos that talk back

Turn any video into an interactive AI experience. With Agentic Videos, viewers don't just watch - they pause, ask questions, get real-time answers, and interact with the presenter inside the video itself. Viewers experience content in a fully personalized way. Creators gain a new world of insight into knowledge gaps and intent: a viewer who asks three questions tells you more than a thousand passive completions ever could. Now on the D-ID platform, built with industry-leading expressive avatars.

ProductHunt

We built a persistent agent memory layer on Elasticsearch with 0.89 recall

This project demonstrates a persistent agent memory layer built on Elasticsearch that achieves 0.89 recall, enabling AI agents to effectively store and retrieve past interactions. The implementation provides reliable memory management for stateful agent systems.

RSS

Juno: Free, local AI powered Voice to Text w/ live transcriptions

Juno is a local, open-source voice writing app for Mac. It is the only voice dictation tool with live transcriptions. Speak naturally in Mail, Slack, Notes, Cursor, or the app you’re already using; Juno writes clean text, rewrites selected passages, uses snippets, and creates Notes, Reminders, and Alarms. No login, runs offline and free forever.

ProductHunt

Genie Mentions: AI that gets you *and* the people in your life, together

Genie was built on the belief that to truly "get" you, a meaningful social product must also "get" the people in your life. introducing Genie Mentions: the first AI that treats your circle as part of who you are. your taste. your friends' taste. what you're all into. Genie keeps you updated on big moves your friends are making, trips they’re taking, dreams they’re conjuring. if you want AI for work, ChatGPT awaits you. if you want to know what's going on in your world, tag Genie in.

ProductHunt

Research Papers

Bag of Dims: Training-Free Mechanistic Interpretability via Dimension-Level Sign Patterns

We show the standard basis of transformer hidden states already provides a training-free, architecture-general feature basis. Individual dimensions encode semantic content via their signs (+/-1) and confidence via their magnitudes, acting as independent binary registers; a feature is a subset of dimensions with a consistent sign pattern, read by counting sign agreements with no learned rotation. We validate this Bag of Dims framework across seven models spanning language (Qwen 3.5-4B, Gemma 3-4B...

HuggingFace

Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games

Deploying multimodal foundation models as closed-loop policies increasingly requires conditioning actions on observations that are no longer visible. However, existing benchmarks either expose the full state, conflate hidden-state reconstruction with other agent skills, or test recall only after an episode has ended. We introduce RNG-Bench (Reconstructive Non-Markov Games), a benchmark suite designed to isolate a base model's ability to reconstruct past observations and act on them during multi-...

HuggingFace

Morpheus: A Morphology-Aware Neural Tokenizer and Word Embedder for Turkish

Turkish is agglutinative: meaning is carried by morphemes, yet the subword tokenizers that drive modern language models split words by corpus statistics, fragmenting semantically loaded suffixes and -- in the case of WordPiece and rule-based analyzers -- failing to decode their output back to the original text. This paper presents Morpheus, a neural morpheme-boundary model for Turkish that is at once a lossless, morphology-aware tokenizer and a word-embedding producer. A differentiable Poisson-b...

HuggingFace

The Reward Was in Your Data All Along: Correcting Flow Matching with Discriminator-Guided RL

Score- and flow-matching models often rely on preference-based reinforcement learning for two purposes: aligning with subjective preferences and, surprisingly, recovering properties such as visual realism and coherent object structure that matching-based training is intended to learn from the data itself. We argue that this reflects a structural mismatch. Matching losses measure ell_2 regression error on the velocity or score field under training-time marginals, a proxy poorly aligned with the v...

HuggingFace

Native Active Perception as Reasoning for Omni-Modal Understanding

Passive models for long video understanding typically rely on a "watch-it-all" paradigm, processing frames uniformly regardless of query difficulty, causing computational cost to grow with video duration. Although interactive frameworks have emerged, they often rely on global pre-scanning, and their context cost still scales with video length. We propose OmniAgent, the first native omni-modal agent that formulates video understanding as a POMDP-based iterative Observation-Thought-Action cycle. O...

HuggingFace

Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation

On-policy self-distillation (OPSD) trains a model on its own rollouts and uses a frozen copy to provide dense token-level targets conditioned on a reference target. This works well for LLM reasoning, but a direct extension to multimodal large language models (MLLMs) can create a shortcut: the privileged target may guide tokens mainly based on the text reference target rather than the image. We propose ViGOS, a visually grounded OPSD framework for MLLM post-training. The student first writes a vi...

HuggingFace

Sumi: Open Uniform Diffusion Language Model from Scratch

Diffusion models have become a promising alternative to autoregressive models. Among these, uniform diffusion language models (UDLMs) permit any token to be updated at any step, in principle enabling more flexible generation. However, no UDLM has yet been pretrained from scratch at both large parameter scale and large token budget. Both autoregressive modeling and masked diffusion modeling already have capable models at scale that the community can study and build on; uniform diffusion has none....

HuggingFace

Learning User Simulators with Turing Rewards

Learning to simulate human users in interactive settings could advance the training of agent assistants, evaluation of personalization systems, research in the social sciences, and more. Existing approaches generally do so by training a large language model (LLM) to match a single ground truth response, either by maximizing the log probability or by using a similarity reward. We instead propose {Turing-RL}: a Turing-Test-based reinforcement learning approach for training user simulator models. {...

HuggingFace

STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability

Reinforcement Learning with Verifiable Rewards algorithms like GRPO have emerged as the dominant post-training paradigm for complex reasoning in LLMs, yet commonly suffer from policy entropy collapse during training. We conduct a first-order gradient analysis of token-level entropy dynamics under GRPO and identify a token-level credit assignment mismatch: the per-token entropy variation decomposes into the product of the trajectory-level advantage and an entropy sensitivity function over the nex...

HuggingFace

RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents

Multi-turn tool-use RL is bottlenecked by the rapid depletion of informative samples in static datasets. We observe that the gradient signal in GRPO concentrates on tasks with the highest rollout reward variance, a consequence of the Popoviciu upper bound. Consequently, samples near the agent's capability boundary -- where successes and failures are roughly balanced -- contribute disproportionately large policy gradients. As training progresses, this boundary continuously shifts, which gradually...

HuggingFace

EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts

Reinforcement learning (RL) has become a representative post-training paradigm for LLMs, enabling strong reasoning and agentic capabilities. However, rollout generation remains a dominant latency bottleneck because autoregressive sampling decodes responses sequentially and a small number of long-tailed generations often determine completion time. Speculative decoding (SD) offers a natural way to address this bottleneck, as it is a well-established technique for serving fixed LLMs that reduces la...

HuggingFace

Externalizing Research Synthesis and Validation in AI Scientists through a Research Harness

AI systems can increasingly automate scientific workflows, but the reasoning that links prior evidence, generated ideas, experiments and final claims often remains implicit inside model inference. Here we introduce Xcientist, a research harness that externalizes research synthesis and experimental validation into inspectable, contract-governed processes. Xcientist organizes literature evidence, idea states, implementation plans, ablation records and repair traces as persistent research artifacts...

HuggingFace

Physics-IQ Verified

Video generative models ( VGMs) have become a new frontier that can be used not just for video generation but for a multitude of downstream tasks, including world modeling. To advance these tasks, a good video model must understand the physical reality of the world. Evaluating this understanding is an emerging field and has led to the Physics-IQ benchmark, which quantifies this explicitly by comparing model-generated videos to real-world videos of physical experiments. In this work, we present a...

HuggingFace

MolmoMotion: Forecasting Point Trajectories in 3D with Language Instruction

Motion forecasting is central to visual intelligence: agents must anticipate how objects will move in order to plan actions, reason about physical interactions, and synthesize realistic futures. We argue that 3D points in world coordinates provide a general representation that is class-agnostic, view-stable, compact, and directly useful for downstream tasks. We formalize the task of goal-conditioned 3D point motion forecasting: given a short visual history, a set of 3D query points on an object ...

HuggingFace

Industry News

Jun 18, 2026Frontier Red TeamProject Fetch: Phase two

Anthropic

Discussion

Local Qwen isn't a worse Opus, it's a different tool

This piece argues that local Qwen models should not be directly compared to Claude Opus as inferior, but rather recognized as serving different use cases and purposes. Local deployment options like Qwen offer distinct advantages for certain applications despite different capabilities.

RSS

ChatGPT's image generator can be manipulated to produce violent, sexual content

Security researchers have demonstrated that ChatGPT's image generator can be manipulated through prompt engineering to produce violent and sexual content that violates OpenAI's usage policies. This vulnerability highlights potential risks in multimodal AI safety.

RSS