Note: This post was generated by AI. Each week, I use an automated pipeline to collect and synthesize the latest AI news from blogs, newsletters, and podcasts into a single digest. The goal is to keep up with the most important AI developments from the past week. For my own writing, see my other posts.

TL;DR

  • GPT-5.5 launched, with meaningfully better autonomous task execution and a major upgrade to OpenAI’s Codex app, which can now browse the web, edit spreadsheets, and work through multi-hour tasks with less hand-holding.
  • DeepSeek V4 arrived as the most capable open-weight model yet, handling million-token contexts at a fraction of the memory cost, and designed to run on Chinese chips, not just NVIDIA hardware.
  • Anthropic raised its run-rate revenue to $30B and signed a massive compute deal with Amazon, signaling the company is scaling infrastructure to match surging demand.
  • Google and others poured billions more into Anthropic, with Bloomberg reporting Google plans to invest up to $40B, as the race to back frontier AI labs accelerates.
  • AI agents are starting to do research autonomously: Anthropic published results showing Claude agents outperformed human researchers on an AI safety problem, at a cost of $22 per hour of AI work.

Story of the Week: The AI Assistant Race Intensifies

OpenAI launched GPT-5.5 this week, and based on Ethan Mollick’s early access writeup , the upgrade is real. The headline change is not raw intelligence but autonomy: the model is noticeably better at executing long, multi-step tasks without constant correction. Mollick fed it a decade of disorganized research data and four prompts later had a draft academic paper, including a real literature review and sophisticated statistics. His verdict: it would have passed as a strong second-year PhD project.

Just as significant is what happened to Codex, OpenAI’s coding and task agent. This week Codex gained the ability to browse the web, control a computer, edit Google Sheets and Slides, and run multi-hour tasks with an automatic quality-checking agent in the background, per AINews . The net effect: Codex is evolving from a coding assistant into a general-purpose work agent. If you use Codex today, the version you log into next week can do considerably more. If you haven’t tried it, the gap between what it could do six months ago and what it can do now is worth experiencing firsthand.

The practical implication: professionals who have been waiting for AI to “get good enough” to handle real work autonomously have a shorter wait than they might expect. The models are not perfect, but the direction of travel is clear. The question is no longer whether AI can help, but which workflows to hand off first.


The Money Behind the Models

The investment figures this week are hard to ignore. Anthropic announced a deal with Amazon for up to 5 gigawatts of compute capacity , with Amazon committing up to an additional $20B on top of its previous $8B investment. Anthropic’s run-rate revenue has now surpassed $30B, up from roughly $9B at the end of 2025. Bloomberg reported Google plans to invest up to $40B. These are not speculative bets on future technology. They are infrastructure commitments made because current demand is already straining capacity, with Anthropic explicitly noting reliability issues for paying customers during peak hours.

For anyone making vendor decisions, this matters. The AI companies you are evaluating are not startups hoping to find product-market fit. They are scaling to meet real demand with some of the largest compute investments ever made. That said, Anthropic’s own postmortem on Claude Code quality issues this week was a useful reminder that growth at this speed creates operational risk. Three separate engineering changes degraded Claude Code’s performance for weeks before the root cause was identified. The company was transparent about it and reset usage limits for affected subscribers, but it illustrates that reliability remains a genuine challenge at this scale.

The practical question for operations and IT leaders: as AI tools become load-bearing infrastructure inside your organization, do you have visibility into when they degrade? The gap between “works great in demos” and “reliable enough to run a business process” is still real.


Open Models Close the Gap (Mostly)

DeepSeek released V4 Pro and V4 Flash , the most significant update to the open-weight model (models whose underlying code is publicly released, so organizations can run them privately) landscape in months. The headline capability is a one-million-token context window, meaning the model can process roughly 750,000 words of text in a single session. That’s enough to analyze an entire company’s contracts, a year of email, or a large codebase at once. DeepSeek achieved this while dramatically reducing the memory required, using about 10x less storage per conversation than its predecessor, per AINews .

Perhaps more geopolitically interesting: DeepSeek V4 is explicitly designed to run on Huawei’s Ascend chips, reducing Chinese AI development’s dependence on NVIDIA hardware that the US has restricted for export. As analyst Nathan Lambert noted in Interconnects , open models from Chinese labs are genuinely competitive on many tasks, though they still lag behind the US frontier on the hardest agentic and long-horizon problems, and show measurable safety differences. An independent safety evaluation of Kimi K2.5, currently the leading Chinese open model, found it had “similar dual-use capabilities to GPT 5.2 and Claude Opus 4.5, but with significantly fewer refusals” on requests related to dangerous materials, per Import AI .

For businesses: open models are increasingly viable for use cases where data privacy requires keeping AI on your own servers. But the safety gap is real and worth evaluating seriously before deploying them in customer-facing or high-stakes contexts.


AI Starts Researching Itself

Anthropic published results from an experiment where Claude agents were tasked with conducting AI safety research autonomously. The agents proposed hypotheses, ran experiments, and iterated, spending the equivalent of 800 hours of work over five days, per Import AI . They dramatically outperformed a team of human researchers on the specific problem tested, recovering nearly the full performance gap on a key metric versus the human team’s 23%. Total cost: $18,000, or $22 per AI-hour of research.

Caveats apply: the method did not generalize to a different model and dataset, and the research direction still required human input to prevent all the agents from converging on the same ideas. But the implication is significant. Structured research, data analysis, and iterative experimentation, tasks that currently require expensive specialist time, are increasingly tractable for AI agents to execute autonomously. This is not just a coding story. Knowledge work that follows a clear loop of hypothesis, test, and evaluate is becoming automatable.

Separately, Microsoft Research released AutoAdapt , an open-source framework that automates the process of customizing a general AI model for a specific industry (fine-tuning, in technical terms, means training an existing model on your own data so it specializes for your domain). The tool turned what typically takes weeks of expert iteration into a roughly 30-minute, $4 process. If your organization has been told “we could build a custom AI model for your industry, but it would take months,” the timeline is compressing fast.


What Workers Are Actually Experiencing

Anthropic surveyed 81,000 Claude users about AI’s economic impact, and the results are worth sharing with your leadership team. The average productivity rating was 5.1 on a 7-point scale (“substantially more productive”). The highest gains were reported by management and technical workers. But early-career workers were significantly more worried about job displacement than senior professionals, and only 60% of early-career workers felt they personally benefited from AI, versus 80% of senior professionals.

The survey also found that people in roles more exposed to AI report higher concerns about displacement, and those experiencing the largest speed gains also express higher displacement anxiety. The data suggests that productivity gains and job insecurity can coexist within the same person. For managers: if you are introducing AI tools to your team, acknowledging this tension explicitly is likely more effective than leading only with efficiency arguments.


Quick Hits

  • Google announced 8th-generation TPUs (its custom AI chips) at Cloud Next, with a training chip delivering nearly 3x the compute of its predecessor and capable of scaling to a million chips in a single cluster. Google
  • Anthropic and NEC partnered to deploy Claude to 30,000 NEC employees globally and co-develop AI tools for Japan’s finance, manufacturing, and government sectors. Anthropic
  • OpenAI launched GPT-Image-2, a significantly improved image generation model that can reliably render readable text within images, making it genuinely useful for slides, mockups, and product visuals. OpenAI
  • Anthropic updated its election safeguards ahead of US midterms, reporting Claude responds appropriately to election-related harmful requests 99.8-100% of the time in testing. Anthropic
  • A GitHub star fraud investigation by CMU researchers found 6 million fake stars across 18,000+ repositories, with AI/LLM projects as the largest non-malicious category, meaning some of the open-source AI tools your teams are evaluating may have inflated apparent popularity. Awesome Agents
  • Noetik, an AI biotech startup, signed a $50M deal with GSK for its TARIO-2 model, which predicts detailed tumor biology from standard pathology images that most patients already have, potentially improving clinical trial matching. Latent Space

What to Watch

  • AI agents completing multi-day autonomous work tasks are moving from demos to real deployments. Watch for your software, research, and operations teams to start experimenting with overnight agent runs. The question worth asking now: what review and approval processes do you need before you trust work an agent did while no one was watching?
  • The open/closed model gap is narrowing on common tasks but persisting on harder ones. If your organization is considering switching from a commercial API to a self-hosted open model to save money or protect data, the next 3-6 months will be telling for whether that gap closes further on agentic and complex reasoning tasks.
  • Customizing AI for your specific industry is about to get much faster and cheaper. Microsoft’s AutoAdapt and similar tools are reducing the cost and time to build domain-specific AI from months to hours. Budget conversations about specialized AI tooling may need to be revisited.
  • Claude Code’s pricing and access are in flux, with reports of possible removal from the $20/month plan and ongoing reliability improvements. If your team has built workflows around it, monitor for plan changes in the coming weeks.