
Podcast
Ashish's AI News Briefings.
Morning and evening AI news briefings, plus research paper deep dives.
RSS feedSubscribe in Apple Podcasts, Spotify, or any app via RSS.
Episodes
ResearchJun 2
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
The ability to accurately interpret complex visual information is a crucial topic of multimodal large language models (MLLMs). Recent work indicates that enhanced visual perception significantly reduces hallucinations and improves performance on resolution-sensitive tasks, such as optical character recognition and document analysis. A number of recent MLLMs achieve this goal using a mixture of vision encoders. Despite their success, there is a lack of systematic comparisons and detailed ablation studies addressing critical aspects, such as expert selection and the integration of multiple vision experts. This study provides an extensive exploration of the design space for MLLMs using a mixture of vision encoders and resolutions. Our findings reveal several underlying principles common to various existing strategies, leading to a streamlined yet effective design approach. We discover that simply concatenating visual tokens from a set of complementary vision encoders is as effective as more complex mixing architectures or strategies. We additionally introduce Pre-Alignment to bridge the gap between vision-focused encoders and language tokens, enhancing model coherence. The resulting family of MLLMs, Eagle, surpasses other leading open-source models on major MLLM benchmarks.
EveningJun 2
2026-06-02
Microsoft Build 2026: Microsoft consolidates agent context, models, governance, Windows sandboxes, and developer workflows; Promoting Advanced Artificial Intelligence Innovation and Security; Enterprise Software Leaders Build AI Agents With NVIDIA; Expanding Project Glasswing; Workday Launches Agent Passport to Test, Verify, and Continuously Monitor Every AI Agent in the Enterprise
MorningJun 2
2026-06-02
Anthropic’s browser agent got hijacked 31.5% of the time before safeguards engaged; Why we Built our own Cloud Agent Infrastructure; Develop Physical AI Reasoning, World, and Action Models with NVIDIA Cosmos 3; Anthropic Files to Go Public, Setting Stage for Huge I.P.O.; MiniMax M3 - Coding & Agentic Frontier, 1M Context, Multimodal
ResearchJun 1
VLM3: Vision Language Models Are Native 3D Learners
Vision Language Models (VLMs) enable a unified model to solve various vision tasks through prompting. They have shown promising performance in semantic understanding. However, 3D understanding still largely relies on expert vision models with complex task-specific designs. The key argument this work wants to make is that VLMs are native 3D learners. Our in-depth large scale study shows that 1) focal length unification, 2) text-based pixel reference and 3) data mixture and scaling, are all you need for effective 3D learning. Model architecture changes, large models, heavy data augmentations, and complex losses including the regression formulation, many of which form the foundation of expert vision models, are actually not necessary conditions. As a result, we propose VLM3, a scalable method with the simplest design that enables standard VLMs to master diverse 3D tasks. VLM3 not only advances the VLM depth estimation accuracy by a large margin (0.84 -> 0.9), but also enables diverse 3D tasks such as pixel correspondence, camera pose estimation and object-level 3D understanding, matching expert vision model accuracy while maintaining standard architectures and text-based training. We believe VLM3 opens up a new paradigm for simple and scalable 3D learning.
EveningJun 1
2026-06-01
Anthropic confidentially submits draft S-1 to the SEC; NVIDIA at COMPUTEX 2026: RTX Spark + DLSS 4.5 updates; Deepening OpenAI collaboration with U.S. Department of Energy; GitHub Copilot usage-based billing now live; Organize My Files in Drive now generally available
MorningJun 1
2026-06-01
Mistral AI launches Vibe, expands into industrial AI and announces data center push to challenge OpenAI; From data overload to actionable insights: How Verizon Connect scaled agentic AI to 100,000 users; Solving Long-Context Evals for Production Agents; AI’s Impact on SaaS Will Be Uneven. Here’s What Leaders Need to Know; AI Is Already Rewiring the Aftermarket and Services
ResearchMay 31
LongCat-Video Technical Report
Video generation is a critical pathway toward world models, with efficient long video inference as a key capability. Toward this end, we introduce LongCat-Video, a foundational video generation model with 13.6B parameters, delivering strong performance across multiple video generation tasks. It particularly excels in efficient and high-quality long video generation, representing our first step toward world models. Key features include: Unified architecture for multiple tasks: Built on the Diffusion Transformer (DiT) framework, LongCat-Video supports Text-to-Video, Image-to-Video, and Video-Continuation tasks with a single model; Long video generation: Pretraining on Video-Continuation tasks enables LongCat-Video to maintain high quality and temporal coherence in the generation of minutes-long videos; Efficient inference: LongCat-Video generates 720p, 30fps videos within minutes by employing a coarse-to-fine generation strategy along both the temporal and spatial axes. Block Sparse Attention further enhances efficiency, particularly at high resolutions; Strong performance with multi-reward RLHF: Multi-reward RLHF training enables LongCat-Video to achieve performance on par with the latest closed-source and leading open-source models. Code and model weights are publicly available to accelerate progress in the field.
EveningMay 31
2026-05-31
Model Release Notes | OpenAI Help Center; Governor Newsom signs first-of-its-kind executive order to prepare workers and businesses for potential AI disruption; SoftBank says it will invest up to €75 billion to build French data centers
ResearchMay 30
stable-worldmodel-v1: Reproducible World Modeling Research and Evaluation
World Models have emerged as a powerful paradigm for learning compact, predictive representations of environment dynamics, enabling agents to reason, plan, and generalize beyond direct experience. Despite recent interest in World Models, most available implementations remain publication-specific, severely limiting their reusability, increasing the risk of bugs, and reducing evaluation standardization. To mitigate these issues, we introduce stable-worldmodel (SWM), a modular, tested, and documented world-model research ecosystem that provides efficient data-collection tools, standardized environments, planning algorithms, and baseline implementations. In addition, each environment in SWM enables controllable factors of variation, including visual and physical properties, to support robustness and continual learning research. Finally, we demonstrate the utility of SWM by using it to study zero-shot robustness in DINO-WM.
EveningMay 30
2026-05-30
NVIDIA Unveils New Open Models, Data and Tools to Advance AI Across Every Industry; Strengthening societal resilience with Rosalind Biodefense; Runway started by helping filmmakers — now it wants to beat Google at AI
ResearchMay 29
minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models
Recent video diffusion foundation models have achieved remarkable progress in high-quality video generation, yet turning them into real-time interactive video world models remains challenging. Interactive world models require controllable, causal, and low-latency rollout, which in practice demands a full pipeline spanning data construction, controllable fine-tuning, autoregressive training, few-step distillation, and streaming inference. In this work, we present minWM, a full-stack open-source framework for building real-time interactive video world models. minWM provides an end-to-end pipeline that converts existing bidirectional T2V/TI2V video foundation models into camera-controllable few-step autoregressive world models. Specifically, minWM first fine-tunes a bidirectional video diffusion model with camera control, and then applies the Causal Forcing / Causal Forcing++ pipeline, including AR diffusion training, causal ODE or causal consistency distillation, and asymmetric DMD, to distill it into a few-step autoregressive generator for low-latency rollout. The framework is modular and architecture-extensible: we instantiate it on representative open backbones, including Wan2.1-T2V-1.3B and HY1.5-TI2V-8B, covering both cross-attention-based condition injection and MMDiT-style architectures. minWM also supports adapting existing video world models, such as HY-WorldPlay, to new data distributions, training recipes, and latency targets. Beyond releasing runnable scripts, checkpoints, documentation, and inference code, we provide practical ablations on camera trajectory quality, controllability training steps, and minimal batch-size requirements. We hope minWM serves as a reproducible and extensible recipe for building and adapting real-time interactive video world models. Project Page: [ this https URL ]( this https URL )
EveningMay 29
2026-05-29
Building self-improving tax agents with Codex; Building a safe, effective sandbox to enable Codex on Windows; News — Google DeepMind (May 2026 updates index)
ResearchMay 28
OSP-Next: Efficient High-Quality Video Generation with Sparse Sequence Parallelism, HiF8 Quantization, and Reinforcement Learning
Diffusion Transformers achieve strong video generation quality, but the quadratic cost of full attention limits efficiency. We introduce OSP-Next, an efficient text-to-video generation model that integrates sparse attention, parallelism, quantization, and reinforcement learning. OSP-Next uses a hybrid full-sparse attention architecture, where the sparse component is implemented with Skiparse-2D Attention. This fixed-pattern mechanism applies token-wise and group-wise sparse attention along spatial dimensions, leveraging locality while maintaining native compatibility with FlashAttention kernels. Based on the local equivalence of rearrangement in Skiparse-2D Attention, we further propose Sparse Sequence Parallelism (SSP), which partitions subsequences across ranks and switches sparse patterns through a single All-to-All communication. Compared with Ulysses Sequence Parallelism (SP), SSP provides a native parallel strategy for sparse attention and reduces communication volume by 75%. OSP-Next also incorporates HiF8 quantization to enable stable joint training with 8-bit quantization and sparse fine-tuning, and applies Mix-GRPO post-training to improve the performance of the sparse model. Experiments show that OSP-Next achieves a VBench total score of 83.73%, surpassing the Wan2.1 baseline. Under the 5-second 720P and 5-second 768P settings, OSP-Next achieves up to 1.64$\times$ single-GPU speedup and over 1.52$\times$ eight-GPU speedup on NVIDIA H200 GPUs. In addition, with only a 0.4% drop in VBench total score, OSP-Next-HiF8 achieves 1.69$\times$ and 2.27$\times$ speedups under the two settings on a single Ascend 950PR, demonstrating the efficiency and performance of OSP-Next across hardware platforms.
MorningMay 28
2026-05-28
Building self-improving tax agents with Codex; Is a compute crunch coming?; Introducing 1-bit and Ternary Bonsai Image 4B: Image Generation for Local Devices; CFOs Funded the AI Revolution. Now They’re Joining It.; Choosing to Stay Human
MorningMay 27
2026-05-27
A terminal is all you need for web agents; How Glance turns hours of video into mobile-ready clips with AI; Introducing Grok Build; Some ideas for what comes next, May 2026
ResearchMay 26
TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction
Sparse-view 3D reconstruction is increasingly addressed with feed-forward splatting networks that predict explicit primitives directly from images. Yet most existing methods remain centered on Gaussian primitives and expose surfaces only indirectly: extracting a usable mesh for downstream simulation, physics reasoning, or embodied interaction still requires expensive post-hoc steps that break the feed-forward promise. This limitation is especially pronounced in pose-free settings, where scene structure and camera parameters must be estimated jointly from sparse observations. We present TriSplat, a feed-forward reconstruction network that represents scenes with oriented triangle primitives and directly exports simulation-ready mesh scenes from a single forward pass. Given input images, the network predicts local 3D point maps, triangle attributes, camera poses, and optional intrinsics. Rather than regressing triangle orientation as an unconstrained latent variable, our approach constructs geometry normals from the predicted point maps, refines them with an image-conditioned normal head, and converts them into stable local frames for triangle parameterization. A mono-normal bootstrap schedule further stabilizes early training, while opacity and blur scheduling progressively sharpens the learned surface representation for direct mesh extraction. Experiments on RealEstate10K and DL3DV show that this representation produces more geometry-faithful reconstructions than Gaussian feed-forward baselines while maintaining competitive novel-view rendering quality. Because the rendering primitives are themselves surface triangles, the output can be directly ingested by physics engines, collision detectors, and standard rendering pipelines without any conversion, making it a practical simulation-ready solution for feed-forward 3D scene reconstruction.
EveningMay 26
2026-05-26
China's DeepSeek to make permanent 75% price cut on flagship V4‑Pro AI model; Trending Papers - Hugging Face (SkillOpt spotlight)
MorningMay 26
2026-05-26
SkillOpt: Executive Strategy for Self-Evolving Agent Skills; AdventHealth advances whole-person care with OpenAI; The Best Manufacturers Build AI with Workers, Not for Them; How I Choose Which Cloudflare Employees to Replace With AI; How AI is forcing McKinsey and its peers to rethink pricing
ResearchMay 25
SkillOpt: Executive Strategy for Self-Evolving Agent Skills
Agent skills today are hand-crafted, generated one-shot, or evolved through loosely controlled self-revision, none of which behaves like a deep-learning optimizer for the skill, and none of which reliably improves over its starting point under feedback. We argue the skill should instead be trained as the external state of a frozen agent, with the same discipline that makes weight-space optimization reproducible. SkillOpt is, to our knowledge, the first systematic controllable text-space optimizer for agent skills: a separate optimizer model turns scored rollouts into bounded add/delete/replace edits on a single skill document, and an edit is accepted only when it strictly improves a held-out validation score. A textual learning-rate budget, rejected-edit buffer, and epoch-wise slow/meta update make skill training stable while adding zero inference-time model calls at deployment. Across six benchmarks, seven target models, and three execution harnesses (direct chat, Codex, Claude Code), SkillOpt is best or tied on all 52 evaluated (model, benchmark, harness) cells and beats every per-cell competitor among human, one-shot LLM, Trace2Skill, TextGrad, GEPA, and EvoSkill skills. On GPT-5.5 it lifts the average no-skill accuracy by +23.5 points in direct chat, by +24.8 inside the Codex agentic loop, and by +19.1 inside Claude Code. Transfer experiments further show that optimized skill artifacts retain value when moved across model scales, between Codex and Claude Code execution environments, and to a nearby math benchmark without further optimization.
EveningMay 25
2026-05-25
100 things we announced at I/O 2026; All the news from the Google I/O 2026 Developer keynote; From AI pilots to enterprise impact: Why execution is the new differentiator; Anthropic in talks to use Microsoft's AI chips, The Information reports
MorningMay 25
2026-05-25
State of AI: May 2026; Think 2026: IBM Delivers the Blueprint for the AI Operating Model as the AI Divide Widens; AI Updates Today (May 2026) – Latest AI Model Releases
ResearchMay 24
L2P: Unlocking Latent Potential for Pixel Generation
Pixel diffusion models have recently regained attention for visual generation. However, training advanced pixel-space models from scratch demands prohibitive computational and data resources. To address this, we propose the Latent-to-Pixel (L2P) transfer paradigm, an efficient framework that directly harnesses the rich knowledge of pre-trained LDMs to build powerful pixel-space models. Specifically, L2P discards the VAE in favor of large-patch tokenization and freezes the source LDM's intermediate layers, exclusively training shallow layers to learn the latent-to-pixel transformation. By utilizing LDM-generated synthetic images as the sole training corpus, L2P fits an already smooth data manifold, enabling rapid convergence with zero real-data collection. This strategy allows L2P to seamlessly migrate massive latent priors to the pixel space using only 8 GPUs. Furthermore, eliminating the VAE memory bottleneck unlocks native 4K ultra-high resolution generation. Extensive experiments across mainstream LDM architectures show that L2P incurs negligible training overhead, yet performs on par with the source LDM on DPG-Bench and reaches 93% performance on GenEval.
EveningMay 24
2026-05-24
A new era for AI Search; A new personal finance experience in ChatGPT
ResearchMay 23
Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation
Despite rapid advances in automatic speech recognition (ASR) and large audio-language models, robust recognition in real-world environments remains limited by an "acoustic robustness bottleneck": models often lose acoustic grounding and produce omissions or hallucinations under severe, compositional distortions. We propose Mega-ASR, a unified ASR-in-the-wild framework that combines scalable compound-data construction with progressive acoustic-to-semantic optimization. We introduce Voices-in-the-Wild-2M, covering 7 classic acoustic phenomena and 54 physically plausible compound scenarios, and train Mega-ASR with Acoustic-to-Semantic Progressive Supervised Fine-Tuning and Dual-Granularity WER-Gated Policy Optimization. Extensive experiments demonstrate that Mega-ASR achieves significant advantages over prior state-of-the-art systems on adverse-condition ASR benchmarks (45.69% vs. 54.01% on VOiCES R4-B-F, and 21.49% vs. 29.34% on NOIZEUS Sta-0). On complex compositional acoustic scenarios, Mega-ASR further delivers over 30% relative WER reduction against strong open- and closed-source baselines, establishing a scalable paradigm for robust ASR in-the-wild.
EveningMay 23
2026-05-23
CAISI Evaluation of DeepSeek V4 Pro; The Art of Building Verifiers for Computer Use Agents; State of AI: May 2026
ResearchMay 22
PhysX-Omni: Unified Simulation-Ready Physical 3D Generation for Rigid, Deformable, and Articulated Objects
Simulation-ready physical 3D assets have emerged as a promising direction owing to their broad applicability in downstream tasks. However, most existing 3D generation methods either neglect physical properties or are limited to a single asset category, e.g., rigid, deformable, or articulated objects. To address these limitations, we introduce PhysX-Omni, a unified framework for simulation-ready physical 3D generation across diverse asset types. Specifically, we develop a novel and efficient geometry representation tailored for Vision-Language Models, which directly encodes high-resolution 3D structures without compression, significantly improving generation performance. In addition, we construct the first general simulation-ready 3D dataset, PhysXVerse, covering diverse indoor and outdoor categories. Furthermore, to comprehensively and flexibly evaluate both generative and understanding capabilities in the wild, we propose PhysX-Bench, which encompasses six key attributes: geometry, absolute scale, material, affordance, kinematics, and function description. Extensive experiments with conventional metrics and PhysX-Bench show that PhysX-Omni performs strongly in both generation and understanding. Moreover, additional studies further validate the potential of PhysX-Omni for applications in simulation-ready scene generation and robotic policy learning. We believe PhysX-Omni can significantly advance a wide range of downstream applications, particularly in embodied AI and physics-based simulation.
EveningMay 22
2026-05-22
Gemini 3.5: frontier intelligence with action; OpenAI named a Leader in enterprise coding agents by Gartner; An OpenAI model has disproved a central conjecture in discrete geometry; Center for AI Standards and Innovation (CAISI) frontier AI testing posture
MorningMay 22
2026-05-22
Stable Audio 3.0, the model family built for artistic experimentation with open-weight models; How Ramp engineers accelerate code review with Codex; Presien reduces critical safety events on construction sites by 70%+ with Claude; Qwen3.7: The Agent Frontier; Your AI Change Is Actually a People Change
ResearchMay 21
LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation
We present LongLive-2.0, an NVFP4-based parallel infrastructure throughout the full training and inference workflow of long video generation, addressing speed and memory bottlenecks. For training, we introduce sequence-parallel autoregressive (AR) training, instantiated as Balanced SP, which co-designs the efficient teacher-forcing layout with SP execution by pairing clean-history and noisy-target temporal chunks on each rank, enabling a natural teacher-forcing mask with SP-aware chunked VAE encoding. Combined with NVFP4 precision, it reduces GPU memory cost and accelerates GEMM computation during training, the proportion of which increases as video length grows. Moreover, we show that a high-quality infrastructure and dataset enable a remarkably clean training pipeline. Unlike existing Self-Forcing series methods that rely on ODE initialization and subsequent distribution matching distillation (DMD), LongLive-2.0 directly tunes a diffusion model into a long, multi-shot, interactive auto-regressive (AR) diffusion model. It can be further converted to real-time generation (4 to 2 denoising steps) with standalone LoRA weights. For inference on Blackwell GPUs, we enable W4A4 NVFP4 inference, quantize KV cache into NVFP4 for memory savings, and boost end-to-end throughput with asynchronous streaming VAE decoding. On non-Blackwell GPU architectures, we deploy SP inference to match the speed on Blackwell GPUs, while the quantized KV cache can lower inter-GPU communication of SP. Experiments show up to 2.15x speedup in training, and 1.84x in inference. LongLive-2.0-5B achieves 45.7 FPS inference while attaining strong performance on benchmarks. To our knowledge, LongLive-2.0 is the first NVFP4 training and inference system for long video generation.
EveningMay 21
2026-05-21
An OpenAI model has disproved a central conjecture in discrete geometry; Co-Scientist: A multi-agent AI partner to accelerate research; Accelerating scientific discovery with Co-Scientist (Nature); Vera Arrives: NVIDIA’s First CPU Built for Agents Lands at Top AI Labs
MorningMay 20
2026-05-20
General Agent: A Self-Evolving, Synthetic Agent Environment
ResearchMay 19
Lance: Unified Multimodal Modeling by Multi-Task Synergy
We present Lance, a lightweight native unified model supporting multimodal understanding, generation, and editing for both images and videos. Rather than relying on model capacity scaling or text-image-dominant designs, Lance explores a practical paradigm for unified multimodal modeling via collaborative multi-task training. It is grounded in two core principles: unified context modeling and decoupled capability pathways. Specifically, Lance is trained from scratch and employs a dual-stream mixture-of-experts architecture on shared interleaved multimodal sequences, enabling joint context learning while decoupling the pathways for understanding and generation. We further introduce modality-aware rotary positional encoding to mitigate interference among heterogeneous visual tokens and boost cross-task alignment. During training, Lance adopts a staged multi-task training paradigm with capability-oriented objectives and adaptive data scheduling to strengthen both semantic comprehension and visual generation performance. Experimental results demonstrate that Lance substantially outperforms existing open-source unified models in image and video generation, while retaining strong multimodal understanding capabilities. The homepage is available at this https URL .
EveningMay 19
2026-05-19
I/O 2026: Welcome to the agentic Gemini era; Advancing content provenance for a safer, more transparent AI ecosystem; Anthropic acquires Stainless; The 13 biggest announcements at Google I/O 2026; AI Act | Shaping Europe’s digital future
MorningMay 19
2026-05-19
Project Glasswing: what Mythos showed us; Introducing Composer 2.5; Starchild-1: The First Real-Time Multimodal World Model; How Claude Code works in large codebases: Best practices and where to start; A new personal finance experience in ChatGPT
ResearchMay 18
MMSkills: Towards Multimodal Skills for General Visual Agents
Reusable skills have become a core substrate for improving agent capabilities, yet most existing skill packages encode reusable behavior primarily as textual prompts, executable code, or learned routines. For visual agents, however, procedural knowledge is inherently multimodal: reuse depends not only on what operation to perform, but also on recognizing the relevant state, interpreting visual evidence of progress or failure, and deciding what to do next. We formalize this requirement as multimodal procedural knowledge and address three practical challenges: (I) what a multimodal skill package should contain; (II) where such packages can be derived from public interaction experience; and (III) how agents can consult multimodal evidence at inference time without excessive image context or over-anchoring to reference screenshots. We introduce MMSkills, a framework for representing, generating, and using reusable multimodal procedures for runtime visual decision making. Each MMSkill is a compact, state-conditioned package that couples a textual procedure with runtime state cards and multi-view keyframes. To construct these packages, we develop an agentic trajectory-to-skill Generator that transforms public non-evaluation trajectories into reusable multimodal skills through workflow grouping, procedure induction, visual grounding, and meta-skill-guided auditing. To use them, we introduce a branch-loaded multimodal skill agent: selected state cards and keyframes are inspected in a temporary branch, aligned with the live environment, and distilled into structured guidance for the main agent. Experiments across GUI and game-based visual-agent benchmarks show that MMSkills consistently improve both frontier and smaller multimodal agents, suggesting that external multimodal procedural knowledge complements model-internal priors.
EveningMay 18
2026-05-18
NASA’s new AI space chip could let spacecraft think for themselves; We’re launching the Google DeepMind Accelerator program in Asia Pacific to tackle environmental risks.; Google I/O 2026: How to Watch the Keynote and What to Expect; New Models Today — AI & LLM Releases Last 24 Hours
MorningMay 18
2026-05-18
Work with Codex from anywhere
ResearchMay 17
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory
Large Language Models (LLMs) have demonstrated remarkable prowess in generating contextually coherent responses, yet their fixed context windows pose fundamental challenges for maintaining consistency over prolonged multi-session dialogues. We introduce Mem0, a scalable memory-centric architecture that addresses this issue by dynamically extracting, consolidating, and retrieving salient information from ongoing conversations. Building on this foundation, we further propose an enhanced variant that leverages graph-based memory representations to capture complex relational structures among conversational elements. Through comprehensive evaluations on LOCOMO benchmark, we systematically compare our approaches against six baseline categories: (i) established memory-augmented systems, (ii) retrieval-augmented generation (RAG) with varying chunk sizes and k-values, (iii) a full-context approach that processes the entire conversation history, (iv) an open-source memory solution, (v) a proprietary model system, and (vi) a dedicated memory management platform. Empirical results show that our methods consistently outperform all existing memory systems across four question categories: single-hop, temporal, multi-hop, and open-domain. Notably, Mem0 achieves 26% relative improvements in the LLM-as-a-Judge metric over OpenAI, while Mem0 with graph memory achieves around 2% higher overall score than the base configuration. Beyond accuracy gains, we also markedly reduce computational overhead compared to full-context method. In particular, Mem0 attains a 91% lower p95 latency and saves more than 90% token cost, offering a compelling balance between advanced reasoning capabilities and practical deployment constraints. Our findings highlight critical role of structured, persistent memory mechanisms for long-term conversational coherence, paving the way for more reliable and efficient LLM-driven AI agents.
EveningMay 17
2026-05-17
OpenAI launches the OpenAI Deployment Company to help businesses build around intelligence; GPT-5.5 Instant: smarter, clearer, and more personalized; Introducing Claude Design by Anthropic Labs; Gemini Robotics 1.5 brings AI agents into the physical world
ResearchMay 16
SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer
We introduce SANA-Video, a small diffusion model that can efficiently generate videos up to 720x1280 resolution and minute-length duration. SANA-Video synthesizes high-resolution, high-quality and long videos with strong text-video alignment at a remarkably fast speed, deployable on RTX 5090 GPU. Two core designs ensure our efficient, effective and long video generation: (1) Linear DiT: We leverage linear attention as the core operation, which is more efficient than vanilla attention given the large number of tokens processed in video generation. (2) Constant-Memory KV cache for Block Linear Attention: we design block-wise autoregressive approach for long video generation by employing a constant-memory state, derived from the cumulative properties of linear attention. This KV cache provides the Linear DiT with global context at a fixed memory cost, eliminating the need for a traditional KV cache and enabling efficient, minute-long video generation. In addition, we explore effective data filters and model training strategies, narrowing the training cost to 12 days on 64 H100 GPUs, which is only 1% of the cost of MovieGen. Given its low cost, SANA-Video achieves competitive performance compared to modern state-of-the-art small diffusion models (e.g., Wan 2.1-1.3B and SkyReel-V2-1.3B) while being 16x faster in measured latency. Moreover, SANA-Video can be deployed on RTX 5090 GPUs with NVFP4 precision, accelerating the inference speed of generating a 5-second 720p video from 71s to 29s (2.4x speedup). In summary, SANA-Video enables low-cost, high-quality video generation.
EveningMay 16
2026-05-16
Introducing GPT-5; Anthropic forms $200 million partnership with the Gates Foundation; A smarter, more proactive Android with Gemini Intelligence; OpenAI to give EU access to new cyber model but Anthropic still holding out on Mythos
ResearchMay 15
OpenHands: An Open Platform for AI Software Developers as Generalist Agents
Software is one of the most powerful tools that we humans have at our disposal; it allows a skilled programmer to interact with the world in complex and profound ways. At the same time, thanks to improvements in large language models (LLMs), there has also been a rapid development in AI agents that interact with and affect change in their surrounding environments. In this paper, we introduce OpenHands (f.k.a. OpenDevin), a platform for the development of powerful and flexible AI agents that interact with the world in similar ways to those of a human developer: by writing code, interacting with a command line, and browsing the web. We describe how the platform allows for the implementation of new agents, safe interaction with sandboxed environments for code execution, coordination between multiple agents, and incorporation of evaluation benchmarks. Based on our currently incorporated benchmarks, we perform an evaluation of agents over 15 challenging tasks, including software engineering (e.g., SWE-BENCH) and web browsing (e.g., WEBARENA), among others. Released under the permissive MIT license, OpenHands is a community project spanning academia and industry with more than 2.1K contributions from over 188 contributors.
EveningMay 15
2026-05-15
Grok Model Retirement on May 15, 2026; Defense at AI speed: Microsoft’s new multi-model agentic security system tops leading industry benchmark
ResearchMay 14
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
Recent large vision-language models (VLMs) remain fundamentally constrained by a persistent dichotomy: understanding and generation are treated as distinct problems, leading to fragmented architectures, cascaded pipelines, and misaligned representation spaces. We argue that this divide is not merely an engineering artifact, but a structural limitation that hinders the emergence of native multimodal intelligence. Hence, we introduce SenseNova-U1, a native unified multimodal paradigm built upon NEO-unify, in which understanding and generation evolve as synergistic views of a single underlying process. We launch two native unified variants, SenseNova-U1-8B-MoT and SenseNova-U1-A3B-MoT, built on dense (8B) and mixture-of-experts (30B-A3B) understanding baselines, respectively. Designed from first principles, they rival top-tier understanding-only VLMs across text understanding, vision-language perception, knowledge reasoning, agentic decision-making, and spatial intelligence. Meanwhile, they deliver strong semantic consistency and visual fidelity, excelling in conventional or knowledge-intensive any-to-image (X2I) synthesis, complex text-rich infographic generation, and interleaved vision-language generation, with or without think patterns. Beyond performance, we show detailed model design, data preprocessing, pre-/post-training, and inference strategies to support community research. Last but not least, preliminary evidence demonstrates that our models extend beyond perception and generation, performing strongly in vision-language-action (VLA) and world model (WM) scenarios. This points toward a broader roadmap where models do not translate between modalities, but think and act across them in a native manner. Multimodal AI is no longer about connecting separate systems, but about building a unified one and trusting the necessary capabilities to emerge from within.
EveningMay 14
2026-05-14
Exclusive: US clears H200 chip sales to 10 China firms as Nvidia CEO looks for breakthrough; OpenAI explores legal options against Apple, source says; Changelog – Codex | OpenAI Developers; Anthropic and Gates Foundation launch $200 million partnership for AI in health, education; U.S. clears H200 chip sales to 10 China firms as Nvidia CEO looks for breakthrough: Reuters exclusive (BNN Bloomberg syndication)
MorningMay 14
2026-05-14
Protect your enterprise now from the Shai-Hulud worm and npm vulnerability in 6 actionable steps; The end of the trade-off: How AI agents broke the onboarding trilemma; The Math Behind the Cost of AI Agents; Is Software Losing Its Head?; Soon, access to frontier AI will be scarce and selective
MorningMay 13
2026-05-13
Interaction Models: A Scalable Approach to Human-AI Collaboration; How Miro uses Amazon Bedrock to boost software bug routing accuracy and improve time-to-resolution from days to hours; The Inference Shift; The Memory Curse: How Expanded Recall Erodes Cooperative Intent in LLM Agents; Introducing Perceptron Mk1
ResearchMay 12
Pixal3D: Pixel-Aligned 3D Generation from Images
Recent advances in 3D generative models have rapidly improved image-to-3D synthesis quality, enabling higher-resolution geometry and more realistic appearance. Yet fidelity, which measures pixel-level faithfulness of the generated 3D asset to the input image, still remains a central bottleneck. We argue this stems from an implicit 2D-3D correspondence issue: most 3D-native generators synthesize shape in canonical space and inject image cues via attention, leaving pixel-to-3D associations ambiguous. To tackle this issue, we draw inspiration from 3D reconstruction and propose Pixal3D, a pixel-aligned 3D generation paradigm for high-fidelity 3D asset creation from images. Instead of generating in a canonical pose, Pixal3D directly generates 3D in a pixel-aligned way, consistent with the input view. To enable this, we introduce a pixel back-projection conditioning scheme that explicitly lifts multi-scale image features into a 3D feature volume, establishing direct pixel-to-3D correspondence without ambiguity. We show that Pixal3D is not only scalable and capable of producing high-quality 3D assets, but also substantially improves fidelity, approaching the fidelity level of reconstruction. Furthermore, Pixal3D naturally extends to multi-view generation by aggregating back-projected feature volumes across views. Finally, we show pixel-aligned generation benefits scene synthesis, and present a modular pipeline that produces high-fidelity, object-separated 3D scenes from images. Pixal3D for the first time demonstrates 3D-native pixel-aligned generation at scale, and provides a new inspiring way towards high-fidelity 3D generation of object or scene from single or multi-view images. Project page: this https URL
EveningMay 12
2026-05-12
Thomson Reuters + Anthropic: MCP integration connects Claude with CoCounsel Legal; Google DeepMind publishes expanded AlphaEvolve impact metrics; Reuters: Isomorphic Labs raises $2.1B for AI drug discovery scale-up; Reuters: Anthropic expands Claude legal tooling for law firms
MorningMay 12
2026-05-12
EMO: Pretraining mixture of experts for emergent modularity; Uber uses OpenAI to help people earn smarter and book faster; Halliburton enhances seismic workflow creation with Amazon Bedrock and Generative AI; What Are Your Company’s AI Nightmares?; What Corporate Functions of the Future Won’t Look Like Functions at All
MorningMay 11
2026-05-11
Teaching Claude why; ZAYA1-74B-Preview: Scaling Pretraining on AMD; Redesigning Your Marketing Organization for the Agentic Age; Cracking the Code of Campaign Success with Google’s AlphaEvolve Agent; Learning on the Shop floor
ResearchMay 8
ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration
This report describes ARIS (Auto-Research-in-sleep), an open-source research harness for autonomous research, including its architecture, assurance mechanisms, and early deployment experience. The performance of agent systems built on LLMs depends on both the model weights and the harness around them, which governs what information to store, retrieve, and present to the model. For long-horizon research workflows, the central failure mode is not a visible breakdown but a plausible unsupported success: a long-running agent can produce claims whose evidential support is incomplete, misreported, or silently inherited from the executor's framing. Therefore, we present ARIS as a research harness that coordinates machine-learning research workflows through cross-model adversarial collaboration as a default configuration: an executor model drives forward progress while a reviewer from a different model family is recommended to critique intermediate artifacts and request revisions. ARIS has three architectural layers. The execution layer provides more than 65 reusable Markdown-defined skills, model integrations via MCP, a persistent research wiki for iterative reuse of prior findings, and deterministic figure generation. The orchestration layer coordinates five end-to-end workflows with adjustable effort settings and configurable routing to reviewer models. The assurance layer includes a three-stage process for checking whether experimental claims are supported by evidence: integrity verification, result-to-claim mapping, and claim auditing that cross-checks manuscript statements against the claim ledger and raw evidence, as well as a five-pass scientific-editing pipeline, mathematical-proof checks, and visual inspection of the rendered PDF. A prototype self-improvement loop records research traces and proposes harness improvements that are adopted only after reviewer approval.
EveningMay 8
2026-05-08
Running Codex safely at OpenAI; Advancing voice intelligence with new models in the API; Advancing AI evaluation with CAISI (US) and AISI (UK); Google, Microsoft and xAI agree to US government AI testing programme; OpenAI introduces GPT-5.5-Cyber (limited preview)
MorningMay 8
2026-05-08
Natural Language Autoencoders: Turning Claude’s thoughts into text; Agents that transact: Introducing Amazon Bedrock AgentCore payments, built with Coinbase and Stripe; Parloa builds service agents customers want to talk to; Notes from inside China’s AI labs; The New Rules of Customer Experience in the Age of AI
ResearchMay 7
DFlash: Block Diffusion for Flash Speculative Decoding
Autoregressive large language models (LLMs) deliver strong performance but require inherently sequential decoding, leading to high inference latency and poor GPU utilization. Speculative decoding mitigates this bottleneck by using a fast draft model whose outputs are verified in parallel by the target LLM; however, existing methods still rely on autoregressive drafting, which remains sequential and limits practical speedups. Diffusion LLMs offer a promising alternative by enabling parallel generation, but current diffusion models typically underperform compared with autoregressive models. In this paper, we introduce DFlash, a speculative decoding framework that employs a lightweight block diffusion model for parallel drafting. By generating draft tokens in a single forward pass and conditioning the draft model on context features extracted from the target model, DFlash enables efficient drafting with high-quality outputs and higher acceptance rates. Experiments show that DFlash achieves over 6x lossless acceleration across a range of models and tasks, delivering up to 2.5x higher speedup than the state-of-the-art speculative decoding method EAGLE-3.
EveningMay 7
2026-05-07
Advancing voice intelligence with new models in the API; Anthropic strikes SpaceX data center deal as it plows ahead on AI coding; What Google Cloud announced in AI this month – and how it helps you
MorningMay 7
2026-05-07
Higher usage limits for Claude and a compute deal with SpaceX
ResearchMay 6
PersonaLive! Expressive Portrait Image Animation for Live Streaming
Current diffusion-based portrait animation models predominantly focus on enhancing visual quality and expression realism, while overlooking generation latency and real-time performance, which restricts their application range in the live streaming scenario. We propose PersonaLive, a novel diffusion-based framework towards streaming real-time portrait animation with multi-stage training recipes. Specifically, we first adopt hybrid implicit signals, namely implicit facial representations and 3D implicit keypoints, to achieve expressive image-level motion control. Then, a fewer-step appearance distillation strategy is proposed to eliminate appearance redundancy in the denoising process, greatly improving inference efficiency. Finally, we introduce an autoregressive micro-chunk streaming generation paradigm equipped with a sliding training strategy and a historical keyframe mechanism to enable low-latency and stable long-term video generation. Extensive experiments demonstrate that PersonaLive achieves state-of-the-art performance with up to 7-22x speedup over prior diffusion-based portrait animation models.
EveningMay 6
2026-05-06
OpenAI expands ChatGPT ads with self-serve buying, CPC, and conversion measurement; Reuters: Anthropic reportedly commits $200B to Google Cloud/chips over five years; Microsoft and OpenAI amend partnership terms (cloud, IP license, revenue-share mechanics); Google AI April roundup highlights enterprise agent platform, TPUs, Gemma 4, Deep Research Max; White House releases U.S. national AI legislative framework
MorningMay 6
2026-05-06
Five contrarian ideas about GenAI in the workplace
ResearchMay 5
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
We introduce MinerU2.5, a 1.2B-parameter document parsing vision-language model that achieves state-of-the-art recognition accuracy while maintaining exceptional computational efficiency. Our approach employs a coarse-to-fine, two-stage parsing strategy that decouples global layout analysis from local content recognition. In the first stage, the model performs efficient layout analysis on downsampled images to identify structural elements, circumventing the computational overhead of processing high-resolution inputs. In the second stage, guided by the global layout, it performs targeted content recognition on native-resolution crops extracted from the original image, preserving fine-grained details in dense text, complex formulas, and tables. To support this strategy, we developed a comprehensive data engine that generates diverse, large-scale training corpora for both pretraining and fine-tuning. Ultimately, MinerU2.5 demonstrates strong document parsing ability, achieving state-of-the-art performance on multiple benchmarks, surpassing both general-purpose and domain-specific models across various recognition tasks, while maintaining significantly lower computational overhead.
EveningMay 5
2026-05-05
Google DeepMind UK workers vote to unionize amid military-AI concerns
MorningMay 5
2026-05-05
Anthropic and OpenAI are both launching joint ventures for enterprise AI services
MorningMay 4
2026-05-04
EveningMay 4
2026-05-03
Remote agents in Vibe. Powered by Mistral Medium 3.5.; OpenAI models, Codex, and Managed Agents come to AWS; Amazon Bedrock now offers OpenAI models, Codex, and Managed Agents (Limited Preview); Gemini 3 — Google DeepMind
EveningMay 4
2026-05-02
Exclusive: US officials weigh cutting deadlines to fix digital flaws amid worries over AI-powered hacking, sources say; Anthropic Economic Index report: Economic primitives; Bringing AI to the next generation of fusion energy
ResearchMay 1
Geometric Context Transformer for Streaming 3D Reconstruction
Streaming 3D reconstruction aims to recover 3D information, such as camera poses and point clouds, from a video stream, which necessitates geometric accuracy, temporal consistency, and computational efficiency. Motivated by the principles of Simultaneous Localization and Mapping (SLAM), we introduce LingBot-Map, a feed-forward 3D foundation model for reconstructing scenes from streaming data, built upon a geometric context transformer (GCT) architecture. A defining aspect of LingBot-Map lies in its carefully designed attention mechanism, which integrates an anchor context, a pose-reference window, and a trajectory memory to address coordinate grounding, dense geometric cues, and long-range drift correction, respectively. This design keeps the streaming state compact while retaining rich geometric context, enabling stable efficient inference at around 20 FPS on 518 x 378 resolution inputs over long sequences exceeding 10,000 frames. Extensive evaluations across a variety of benchmarks demonstrate that our approach achieves superior performance compared to both existing streaming and iterative optimization-based approaches.
EveningMay 1
2026-05-01
Introducing Advanced Account Security; Pentagon tech chief says Anthropic is still blacklisted, but Mythos is a separate issue; Huawei expects AI chip revenue to jump at least 60% this year, FT reports; Beacon Biosignals is mapping the brain during sleep
MorningMay 1
2026-05-01
Enabling a new model for healthcare with AI co-clinician; Build programmatic agents with the Cursor SDK; Writer launches AI agents that can act without prompts, taking on Amazon, Microsoft and Salesforce; Sun Finance automates ID extraction and fraud detection with generative AI on AWS; From scan to fix, done seamlessly
ResearchApr 30
Kronos: A Foundation Model for the Language of Financial Markets
The success of large-scale pre-training paradigm, exemplified by Large Language Models (LLMs), has inspired the development of Time Series Foundation Models (TSFMs). However, their application to financial candlestick (K-line) data remains limited, often underperforming non-pre-trained architectures. Moreover, existing TSFMs often overlook crucial downstream tasks such as volatility prediction and synthetic data generation. To address these limitations, we propose Kronos, a unified, scalable pre-training framework tailored to financial K-line modeling. Kronos introduces a specialized tokenizer that discretizes continuous market information into token sequences, preserving both price dynamics and trade activity patterns. We pre-train Kronos using an autoregressive objective on a massive, multi-market corpus of over 12 billion K-line records from 45 global exchanges, enabling it to learn nuanced temporal and cross-asset representations. Kronos excels in a zero-shot setting across a diverse set of financial tasks. On benchmark datasets, Kronos boosts price series forecasting RankIC by 93% over the leading TSFM and 87% over the best non-pre-trained baseline. It also achieves a 9% lower MAE in volatility forecasting and a 22% improvement in generative fidelity for synthetic K-line sequences. These results establish Kronos as a robust, versatile foundation model for end-to-end financial time series analysis. Our pre-trained model is publicly available at this https URL .
EveningApr 30
2026-04-30
OpenAI: “Building the compute infrastructure for the Intelligence Age”; Google/Alphabet Q1 2026: AI full-stack monetization acceleration; Anthropic Research: BioMysteryBench for agentic bioinformatics
MorningApr 30
2026-04-30
Remote agents in Vibe. Powered by Mistral Medium 3.5; Introducing NVIDIA Nemotron 3 Nano Omni; Generative AI in healthcare: Adoption matures as agentic AI emerges; How Popsa used Amazon Nova to inspire customers with personalised title suggestions; Shifting from AI-assisted coding to AI-assisted delivery with IBM Bob
EveningApr 29
2026-04-29
Microsoft + OpenAI rewrite economics/governance of the flagship AI alliance; OpenAI GPT-5.5: higher autonomy at similar latency envelope; Google DeepMind Deep Research Max productizes high-compute research agents; Google grants broader Pentagon classified-network AI access; Anthropic Project Glasswing: frontier cyber capability redirected toward defense
ResearchApr 30
VibeVoice Technical Report
This report presents VibeVoice, a novel model designed to synthesize long-form speech with multiple speakers by employing next-token diffusion, which is a unified method for modeling continuous data by autoregressively generating latent vectors via diffusion. To enable this, we introduce a novel continuous speech tokenizer that, when compared to the popular Encodec model, improves data compression by 80 times while maintaining comparable performance. The tokenizer effectively preserves audio fidelity while significantly boosting computational efficiency for processing long sequences. Thus, VibeVoice can synthesize long-form speech for up to 90 minutes (in a 64K context window length) with a maximum of 4 speakers, capturing the authentic conversational ``vibe'' and surpassing open-source and proprietary dialogue models.
EveningApr 28
2026-04-28
Google signs classified AI deal with Pentagon; Microsoft-OpenAI partnership terms reset; Anthropic releases Claude Opus 4.7 GA; NVIDIA details enterprise Codex/GPT-5.5 deployment pattern
MorningApr 28
2026-04-28
How Delivery Hero's agent merges 100+ pull requests a day with Claude; How SAP Concur automates expense reporting with agentic AI; Learning to Orchestrate Agents in Natural Language with the Conductor; Lowe’s Enhances Customer Experience With Gen AI and Digital Twins; It’s the Age of Electricity and America Isn’t Ready
EveningApr 27
2026-04-27
DeepMind + Republic of Korea: national AI science partnership; AWS: Bedrock AgentCore adds managed harness + CLI + coding-agent skills; Reuters: DeepSeek-V4 marks normalization of low-cost challenger dynamics
MorningApr 27
2026-04-27
Orchestrating AI Code Review at scale; Project Deal; Context decay, orchestration drift, and silent AI failures; Tech Services Buyer Survey: Betting Big on AI and Resilience; The End of One-Size-Fits-All Enterprise Software
EveningApr 26
2026-04-26
Reuters: Google to invest up to $40B in Anthropic; OpenAI: Codex for (almost) everything; Anthropic + NEC partnership
EveningApr 24
2026-04-24
OpenAI: GPT-5.5 API rollout and benchmarked agentic gains; Google DeepMind: Decoupled DiLoCo for resilient distributed training; DeepSeek: V4 Preview (Pro + Flash), 1M-context default, open-weight release; Reuters: DeepSeek V4 + Huawei adaptation marks compute-sovereignty inflection
MorningApr 24
2026-04-24
Introducing GPT-5.5; Introducing workspace agents in ChatGPT; Managing context in long-run agentic applications; Building agents that reach production systems with MCP; The CPU Bottleneck in Agentic AI and Why Server CPUs Matter More Than Ever
EveningApr 23
2026-04-23
OpenAI: ChatGPT release notes bundle (Fast Answers, Clinicians, Images 2.0, packaging updates); Anthropic + Amazon: compute expansion to 5GW and deeper silicon lock-in; Google Cloud Next ‘26: Gemini Enterprise Agent Platform and agentic stack unification; DeepMind: Gemini Robotics-ER 1.6 improves embodied reasoning for production robotics tasks; Microsoft: A$25B Australia AI infrastructure expansion through 2029
MorningApr 23
2026-04-23
Vercel breach exposes OAuth governance failure class; Google introduces Gemini Enterprise Agent Platform; AWS/Trend Micro: company-wise memory in Bedrock with Neptune + Mem0; a16z: Why We Need Continual Learning; OpenAI: Introducing ChatGPT Images 2.0
EveningApr 22
2026-04-22
Anthropic launches Project Glasswing for AI-native cyber defense; OpenAI releases GPT-5.4 mini and nano; Google DeepMind unveils Gemini Robotics-ER 1.6; Reuters: Meta to collect employee interaction data for AI training; NVIDIA ecosystem demonstrates full-stack industrial AI at Hannover Messe
MorningApr 22
2026-04-22
Deep Research Max: a step change for autonomous research agents; Chronicle from OpenAI; RBC Capital Markets + NVIDIA agentic AI case study; OffDeal powers every stage of M&A advisory with one Claude-based agent; Dive into Claude Code: The Design Space of Today’s and Future AI Agent Systems (arXiv)
MorningApr 17
2026-04-17
Introducing Claude Opus 4.7; What 27,000 AI Sessions Taught Us About How People Use Agents; Lyra 2.0: Explorable Generative 3D Worlds; Musk asks suppliers to move at “light speed” on new chipmaking plan; When Creating an AI Strategy, Don’t Overlook Employee Perception
MorningApr 16
2026-04-16
[ESSENTIAL] a16z: Physical AI Entering Its Own Scaling Regime; [ESSENTIAL] OpenAI Agents SDK Gets Native Sandbox + Harness Upgrades; [ESSENTIAL] China's AI Coding Subscription Economics Are Structurally Broken; AUDIT AGENT HARNESS ECONOMICS [Engineering/Finance | This Week]; EVALUATE OPENAI AGENTS SDK AS PRODUCTION BASELINE [Platform Engineering | This Sprint]
EveningApr 15
2026-04-15
CYBER AI ARMS RACE ACCELERATES: OpenAI launches GPT-5.4-Cyber (https://openai.com/index/scaling-trusted-access-for-cyber-defense/) just one week after Anthropic; US-CHINA AI PARITY ACHIEVED: Stanford's 2026 AI Index (https://www.technologyreview.com/2026/04/13/1135675/want-to-understand-the-current-state-of-ai-check-out-; EMBODIED AI GOES INDUSTRIAL: DeepMind's Gemini Robotics-ER 1.6 (https://deepmind.google/blog/gemini-robotics-er-1-6/) introduces instrument reading for Boston D; LEGAL PRIVILEGE DOESN'T EXTEND TO AI: Federal judge rules (https://www.reuters.com/legal/government/ai-ruling-prompts-warnings-us-lawyers-your-chats-could-be-us; BENCHMARKS ARE BROKEN: Models now defeat benchmarks faster than researchers can build them. SWE-bench went from 60% to ~100% in one year. Measurement infrastruc
MorningApr 15
2026-04-15
Databricks tested a stronger model against its multi-step agent on hybrid queries. The stronger model still lost by 21%.; Gemini Robotics-ER 1.6: Powering real-world robotics tasks through enhanced embodied reasoning; Evaluating agents for scientific discovery; The Hidden Demand for AI Inside Your Company; Turn your best AI prompts into one-click tools in Chrome
EveningApr 15
2026-04-14
OpenAI scales TAC and introduces GPT-5.4-Cyber for verified defenders; Reuters: OpenAI unveils GPT-5.4-Cyber one week after Anthropic’s Mythos move; BoE governor flags Anthropic Mythos as major cyber-risk concern for regulators; NAACP sues xAI over unpermitted turbine operation for Colossus 2 power; Google expands Gemini in Classroom to all Classroom-supported languages
EveningApr 12
2026-04-12
UK regulators move to assess systemic risk from Anthropic’s latest model; HumanX field signal: Claude currently has disproportionate technical mindshare in agentic coding
EveningApr 11
2026-04-11
OpenAI identifies a third-party supply-chain issue affecting its macOS signing workflow; ChatGPT Search may fall under stricter EU Digital Services Act obligations; Anthropic’s Mythos Preview / Project Glasswing frames cyber-capable models as a restricted deployment tier
EveningApr 10
2026-04-10
CoreWeave signs a multi-year infrastructure agreement with Anthropic; TSMC Q1 revenue confirms that AI chip demand is still outrunning capacity; South Africa opens public comment on a national AI policy framework; Google upgrades Colab’s Gemini integration with Learn Mode and notebook-scoped Custom Instructions
MorningApr 10
2026-04-10
LG's first publicly released Vision Language Model, EXAONE 4.5; Build a FinOps agent using Amazon Bedrock AgentCore; Managers and Executives Disagree on AI—and It’s Costing Companies; The Art of Building Verifiers for Computer Use Agents; The AI transformation manifesto
EveningApr 10
2026-04-09
LG AI Research releases EXAONE 4.5, a 33B multimodal model tuned for document-heavy enterprise reasoning; Anthropic productizes long-horizon agent infrastructure with Claude Managed Agents; OpenAI publishes its enterprise operating model: Frontier control plane + AI superapp; Meta locks in another $21B of AI cloud with CoreWeave and gets early Vera Rubin access
MorningApr 9
2026-04-09
Introducing Muse Spark: Scaling Towards Personal Superintelligence; Claude Managed Agents: get to production 10x faster; AI Adoption by the Numbers; From isolated alerts to contextual intelligence: Agentic maritime anomaly analysis with generative AI; The ATOM Project
Guest appearances
On other shows.
Microsoft Innovation Podcast
Democratizing AI: Ashish Bhatia's Journey from Microsoft to Power Automate and the Evolution of AI Builder
Jan 2024