Note: This post was generated by AI. Each week, I use an automated pipeline to collect and synthesize the latest AI news from blogs, newsletters, and podcasts into a single digest. The goal is to keep up with the most important AI developments from the past week. For my own writing, see my other posts.

TL;DR

  • OpenAI’s model solved an 80-year-old math problem using original reasoning, not a specialized math tool, suggesting AI is approaching genuine research-level thinking across domains.
  • Anthropic’s Project Glasswing found 10,000+ critical software vulnerabilities in one month using its Claude Mythos model, including bugs in Firefox and infrastructure used by billions of devices. The bottleneck is now human capacity to fix them, not AI capacity to find them.
  • Google I/O delivered a major AI push: Gemini 3.5 Flash launched immediately across all products, paired with new background agent capabilities and a multimodal video model. Google processes 7x more AI tokens than a year ago.
  • Anthropic signed 276,000-person deals with KPMG and PwC in the same week, signaling that large professional services firms are moving from AI pilots to firm-wide deployments.
  • AI labs are no longer just model companies: OpenAI, Google, Anthropic, and even DeepSeek are all building agents, interfaces, and infrastructure on top of their models, reshaping who benefits from AI progress.

Story of the Week: AI Finds Security Holes Faster Than Humans Can Patch Them

Anthropic’s Project Glasswing crossed a threshold this week that matters to anyone whose organization depends on software. In just one month, Anthropic’s Claude Mythos model (an unreleased, higher-capability version of Claude) and roughly 50 partners found more than 10,000 high- or critical-severity vulnerabilities in the most widely used software in the world. Cloudflare alone found 2,000 bugs, with a false-positive rate better than human testers. Mozilla found 271 vulnerabilities in Firefox using Mythos, more than ten times what it found in the previous version using an older model.

The phrase that captures this moment: “Progress on software security used to be limited by how quickly we could find new vulnerabilities. Now it’s limited by how quickly we can verify, disclose, and patch them.” Some open-source maintainers have asked Anthropic to slow down disclosures because they cannot keep up. That is a new kind of problem. The AI has become the fast part of the equation. Human review, coordination, and deployment are now the bottleneck.

What this means for you: if your organization uses open-source software, commercial platforms, or cloud infrastructure (almost everyone does), the attack surface and the rate of patching are both accelerating simultaneously. Security teams need to think about patch velocity, not just patch existence. And if you work in finance or regulated industries, this week also brought news that a Glasswing partner bank used Mythos to detect and stop a $1.5 million fraudulent wire transfer in real time, a preview of AI’s role in operational security beyond just code.


An AI Solved an Open Research Problem in Mathematics

For the first time, a general-purpose AI model produced an original, verified mathematical proof that resolved a long-standing open problem. OpenAI announced that an internal model disproved the Erdős planar unit distance conjecture, an 80-year-old problem in combinatorial geometry (the study of how geometric shapes can be counted and arranged). Fields Medalist Timothy Gowers called it “a milestone in AI mathematics.” External mathematicians confirmed the proof and said they would accept it in any journal without hesitation.

What makes this notable is not that an AI did math, but how it did it. The model used original ideas from algebraic number theory (a branch of mathematics dealing with abstract number systems) applied to a geometric question where no one had thought to look. It was not a specialized math solver, not a system designed for this problem. It was the same kind of general reasoning model that answers questions and drafts text. The proof reportedly runs 125 pages and cost under $1,000 in compute time, per AINews .

Why does this matter outside of mathematics? It is evidence that AI is developing the capacity for original insight, not just synthesis or pattern-matching. Strategy, law, research, policy: any domain where the highest-value work involves connecting ideas in ways that haven’t been tried before is now in a different conversation. The timeline for meaningful AI contributions in those fields just moved closer.


The Week’s Biggest Enterprise Play: Professional Services Firms Go All-In on Claude

Two of the world’s largest professional services firms announced firm-wide Claude deployments within days of each other. KPMG will give all 276,000 employees access to Claude , embedding it directly into Digital Gateway, the platform where KPMG professionals do client work in tax, legal, and private equity. A task that used to take weeks of tool-switching to build a tax regulation agent now takes minutes. PwC announced a similar deal , rolling out Claude Code and Cowork to U.S. teams first and then to hundreds of thousands globally, plus certifying 30,000 professionals on Claude.

These are not pilot programs or innovation-lab experiments. Both deals involve actual client work in audit, tax, legal, and deal-making, where accuracy and liability are non-negotiable. The joint research between KPMG and UT Austin framing is also worth noting: the firms are explicitly studying what humans should be doing alongside AI, not assuming the answer is obvious. If you work in professional services, strategy consulting, or any field adjacent to these, the pressure to demonstrate AI-fluency in client contexts is now institutional, not just aspirational.


Google I/O: The Everything Announcement

Google used its annual developer conference to announce more AI products in one week than most companies announce in a year. The practical summary for non-technical professionals:

Gemini 3.5 Flash is now live across all Google products (Gemini app, Search, Workspace, Android) and is notably faster than its predecessor while handling more complex tasks. It has a context window of 1 million tokens, meaning it can process roughly 750,000 words in a single session, useful for long documents, contracts, or research threads. Notably, it’s available to use today, not in a staged rollout. Google reports it processes 3.2 quadrillion tokens per month, up 7x from a year ago, and the Gemini app has 900 million monthly users per AINews .

Gemini Spark is Google’s answer to background agents: tasks that run while your device is closed, on Google Cloud virtual machines. This is the infrastructure for “give the AI a task and come back when it’s done” workflows. Google also launched Gemini Omni, a model that handles video input and output, and demonstrated an agent stack that built a functioning operating system in 12 hours using 93 parallel sub-agents for under $1,000 in compute. Antigravity 2.0, Google’s coding agent, is now available as a desktop app and CLI.

The practical takeaway: Google is not just improving its chatbot. It is rebuilding Search, Android, Workspace, and its developer platform around agents that take multi-step actions over time. If you use Google products professionally, the interfaces you work with are in active redesign.


Andrej Karpathy Joins Anthropic

One line, but a significant one. Andrej Karpathy, one of the most respected AI educators and researchers alive (former head of Tesla Autopilot, founding team member at OpenAI), announced he is joining Anthropic . This generated more Hacker News points than almost any other story this week. Talent moves at this level tend to signal where serious people think the most important work is happening.


The Model-to-Agent Transition

A quieter but structurally important story this week: every major AI lab is becoming an agent company, not just a model company. AINews documented that OpenAI, Anthropic, Google, and even DeepSeek are now explicitly building agent harnesses, interfaces, and workflows on top of their underlying models. AI21, a smaller lab, shut down its model team and pivoted entirely to agents. The practical observation from builders: “the model alone is no longer the product.” Winning requires model plus orchestration plus memory plus workflow plus interface.

For non-technical professionals, this matters in two ways. First, the AI tools you encounter increasingly involve chains of actions, not single responses. Understanding how to set up, supervise, and correct those chains is a new professional skill. Second, Chinese models including DeepSeek-V4-Pro made a pricing move permanent this week: costs roughly 19x cheaper than Claude Opus 4.7 for comparable intelligence tasks, per AINews pricing analysis . The cost pressure on AI-driven workflows is moving very fast.


Security Headlines Worth Watching

Two supply-chain security incidents this week are worth noting for anyone in IT, operations, or risk:

Neither incident involves AI directly, but both are reminders that the attack surface for organizations is expanding in parallel with AI capabilities. Glasswing’s findings (see Story of the Week) and these breaches are part of the same picture.


Quick Hits

  • Elon Musk lost his lawsuit against OpenAI. A unanimous jury found his claims were filed too late. TechCrunch reports this removes one major uncertainty before OpenAI’s anticipated IPO.
  • Cohere released Command A+ as a fully open-weight model (meaning companies can download and run it themselves) under the Apache 2.0 license, its most permissive release yet. Strong on reducing hallucinations; enterprise-focused teams building private deployments should evaluate it.
  • Isomorphic Labs (a DeepMind spinout applying AI to drug discovery) raised $2.1 billion . Drug discovery timelines could compress significantly in the next three to five years.
  • Microsoft released MagenticLite, an experimental AI agent that works across your browser and local file system, built to run on smaller, more affordable models . The agent pauses and asks permission before irreversible actions like logins or form submissions.
  • AI infrastructure unicorns: Exa ($250M at $2.2B), Modal ($355M at $4.7B), and Turbopuffer ($100M ARR, profitable) all hit major milestones this week, per AINews . The plumbing for AI agents is becoming big business.
  • A “positive alignment” paper co-authored by researchers at Oxford, Google DeepMind, OpenAI, and Anthropic argues that keeping AI safe from harm is necessary but not sufficient. The next research frontier: making AI actively good for human flourishing, not just non-harmful. Jack Clark’s Import AI has a good summary .

What to Watch

  • OpenAI’s IPO filing is expected imminently. The lawsuit removal and recent math/research breakthroughs set up a significant narrative moment. Watch for how OpenAI frames the transition from “AI assistant” to “AI researcher.”
  • Glasswing’s patch bottleneck is a slow-moving crisis. As more organizations deploy Mythos-class models for security scanning, the volume of discovered vulnerabilities will outpace patching capacity industry-wide. Expect policy conversations about coordinated disclosure timelines and software liability.
  • Agent pricing war: DeepSeek’s 75% permanent price cut and Cohere’s open-weight release both squeeze the business case for proprietary AI APIs. Organizations building internal AI tools should revisit their build-versus-buy assumptions. The economics changed this week.
  • Andrej Karpathy at Anthropic will likely accelerate Anthropic’s education and developer tooling efforts, given his track record. Watch for new learning resources and possibly a shift in how Anthropic communicates technical concepts to non-specialists.
  • Gemini 3.5 Pro is coming next month. If the Flash model is already competing at the frontier, the Pro release could meaningfully change the competitive landscape again.