Note: This post was generated by AI. Each week, I use an automated pipeline to collect and synthesize the latest AI news from blogs, newsletters, and podcasts into a single digest. The goal is to keep up with the most important AI developments from the past week. For my own writing, see my other posts.
TL;DR
- AI solved a real math problem, not a practice one. GPT-5.4 Pro cracked an open research problem in combinatorics that stumped earlier models, and the mathematician who posed it plans to publish the result. AI is beginning to contribute to the actual frontier of knowledge.
- Anthropic’s usage data reveals a clear pattern: experience pays off. Users with 6+ months on Claude are 10% more successful in their conversations and tackle higher-value work. Getting good at AI tools is a skill that compounds.
- GitHub will train on your private repositories starting April 24 unless you opt out. There’s a single settings page to stop this. Check it before the deadline.
- A compromised AI developer tool stole credentials from thousands of systems. Two versions of LiteLLM, a widely used library for connecting to AI APIs, contained malware that harvested API keys and passwords. If your team uses LiteLLM, check your versions now.
- Anthropic launched a science blog and demonstrated AI completing a theoretical physics paper in two weeks instead of a year. The research community is moving from “AI helps me write” to “AI does the experiment.”
Story of the Week: AI Crosses Into Real Research
This week produced the clearest evidence yet that AI is moving beyond assistance into genuine knowledge creation. Research tracker Epoch AI confirmed that GPT-5.4 Pro solved an open problem in combinatorics (the mathematics of counting and arrangement) that had resisted human solution. The problem’s author, a mathematics professor at UNC Charlotte, reviewed the solution and plans to publish it. He noted that the AI’s approach “eliminates an inefficiency in our lower-bound construction” in a way he had suspected might work but couldn’t figure out. The result will become a peer-reviewed paper, with the researchers who elicited the solution listed as potential co-authors.
This isn’t a model passing an exam or reproducing known results. It’s a model generating new mathematics that experts consider publication-worthy. Subsequent testing showed Claude Opus 4.6 and Gemini 3.1 Pro could also solve the problem, while earlier models including Claude Opus 4.5 could not, suggesting a capability threshold was recently crossed rather than this being a fluke.
Separately, Anthropic launched a science blog and published a case study: Harvard physics professor Matthew Schwartz supervised Claude through a theoretical physics calculation that would normally take a graduate student about a year. It took two weeks, produced 110 drafts and 36 million tokens of work, and resulted in a paper he describes as potentially the most important of his career “not for the physics, but for the method.” He was emphatic that domain expertise remained essential – Claude made enough errors that a non-expert supervisor would have missed critical mistakes. The implication for knowledge workers: AI can now dramatically compress timelines on complex intellectual projects, but it still needs a qualified human in the loop.
Who’s Getting the Most Out of AI (and Why It Matters for You)
Anthropic’s latest Economic Index report tracks how Claude is actually being used across the economy, and the most actionable finding is about experience. Users who have been on the platform for six months or more show a 10% higher success rate in their conversations compared to newer users, even after controlling for what tasks they’re attempting. They also gravitate toward higher-value work and spend less time on personal queries.
The report can’t fully separate “people who were already sophisticated got on the platform early” from “using AI makes you better at using AI.” But either way, the gap is real and growing. If you started using AI tools seriously in the last six months, you’re likely leaving significant capability on the table compared to colleagues who have been iterating longer. The practical move: treat prompt-writing and task decomposition as skills worth deliberate practice, not just intuition.
The broader usage picture shows AI spreading into more everyday tasks (sports scores, product comparisons, home maintenance questions now make up a growing share of activity), while the serious professional use is quietly migrating from consumer chat interfaces into automated workflows. About 49% of jobs have now had at least a quarter of their tasks touched by Claude, a figure that has barely moved in three months, suggesting the initial wave of adoption has saturated and what’s changing is the depth of use rather than the breadth.
A Security Alert Your IT Team May Have Missed
Two versions of LiteLLM, versions 1.82.7 and 1.82.8, were found to contain malicious code that automatically harvested credentials from any system where they were installed. LiteLLM is a widely used open-source library (a software package that developers use to connect applications to multiple AI providers like OpenAI, Anthropic, and Google at once). The malware ran the moment Python started, before any code was executed, and collected API keys, passwords, SSH keys, environment variables, and system information, then sent them to an external server.
This is a supply chain attack: malicious code hidden inside a legitimate, trusted tool. It’s the software equivalent of a compromised component in a product your vendor ships you. If anyone on your engineering or data team uses LiteLLM, confirm they are not on versions 1.82.7 or 1.82.8, rotate any API keys that were present on affected machines, and audit what credentials may have been exposed. The discovery triggered over 900 comments on GitHub and was one of the most-discussed security incidents in the developer community this week.
The broader lesson: AI infrastructure is becoming a target. The tools your teams use to build and run AI applications carry real security risk, and version pinning and package auditing are no longer optional hygiene.
Quick Hits
GitHub training opt-out deadline: April 24. If you have a GitHub account with private repositories and don’t want GitHub using them to train AI models, opt out here before the deadline. This is opt-in by default, meaning inaction means consent. Hacker News discussion
A 400-billion parameter AI model ran on an iPhone 17 Pro. A model that size would have required a server rack two years ago. On-device AI of serious capability is arriving faster than most roadmaps predicted. Source
A court blocked the Pentagon from labeling Anthropic a supply chain risk. The Defense Department had attempted to restrict Anthropic through a national security designation; a federal judge issued an injunction blocking it. The case signals that AI companies are becoming entangled in geopolitical regulatory battles beyond standard commercial oversight.
Sora, OpenAI’s video generation tool, shut down its standalone app. The official account announced the closure this week, with video generation functionality folding into the main ChatGPT product. Consolidation of AI products into unified platforms is accelerating.
The European Parliament voted to end Chat Control 1.0. Starting April 6, major tech platforms including Gmail and LinkedIn must stop automatically scanning private messages in the EU. Relevant if your organization handles European communications and has been uncertain about message privacy obligations.
What to Watch
The “AI as researcher” question is moving from hypothetical to operational. Anthropic’s science blog will publish practical workflows for using AI in research. If your organization does any form of knowledge work (market research, policy analysis, competitive intelligence, scientific R&D), the techniques being developed in academic labs right now will reach you within 12-24 months. Start thinking about what “a qualified human in the loop” means for your domain.
On-device AI will change your assumptions about cloud dependence and data privacy. A 400-billion parameter model on a phone means enterprise AI that never touches an external server is coming. Watch for this to reshape procurement conversations about data residency and vendor lock-in.
The experience gap in AI adoption will become a competitive differentiator. Anthropic’s data shows a measurable skill curve in AI use. Organizations that have been experimenting seriously for a year will have meaningfully more capable teams than those starting now, independent of what tools they use. If you haven’t already, ask your leadership team: who in this organization is building genuine AI fluency, and how are we measuring it?