Posts on Orlando O'Neill

AI Weekly Digest -- March 22-March 29, 2026

Sun, 29 Mar 2026 00:00:00 +0000

Note: This post was generated by AI. Each week, I use an automated pipeline to collect and synthesize the latest AI news from blogs, newsletters, and podcasts into a single digest. The goal is to keep up with the most important AI developments from the past week. For my own writing, see my other posts.

TL;DR

AI solved a real math problem, not a practice one. GPT-5.4 Pro cracked an open research problem in combinatorics that stumped earlier models, and the mathematician who posed it plans to publish the result. AI is beginning to contribute to the actual frontier of knowledge.
Anthropic’s usage data reveals a clear pattern: experience pays off. Users with 6+ months on Claude are 10% more successful in their conversations and tackle higher-value work. Getting good at AI tools is a skill that compounds.
GitHub will train on your private repositories starting April 24 unless you opt out. There’s a single settings page to stop this. Check it before the deadline.
A compromised AI developer tool stole credentials from thousands of systems. Two versions of LiteLLM, a widely used library for connecting to AI APIs, contained malware that harvested API keys and passwords. If your team uses LiteLLM, check your versions now.
Anthropic launched a science blog and demonstrated AI completing a theoretical physics paper in two weeks instead of a year. The research community is moving from “AI helps me write” to “AI does the experiment.”

Story of the Week: AI Crosses Into Real Research

This week produced the clearest evidence yet that AI is moving beyond assistance into genuine knowledge creation. Research tracker Epoch AI confirmed that GPT-5.4 Pro solved an open problem in combinatorics (the mathematics of counting and arrangement) that had resisted human solution. The problem’s author, a mathematics professor at UNC Charlotte, reviewed the solution and plans to publish it. He noted that the AI’s approach “eliminates an inefficiency in our lower-bound construction” in a way he had suspected might work but couldn’t figure out. The result will become a peer-reviewed paper, with the researchers who elicited the solution listed as potential co-authors.

This isn’t a model passing an exam or reproducing known results. It’s a model generating new mathematics that experts consider publication-worthy. Subsequent testing showed Claude Opus 4.6 and Gemini 3.1 Pro could also solve the problem, while earlier models including Claude Opus 4.5 could not, suggesting a capability threshold was recently crossed rather than this being a fluke.

Separately, Anthropic launched a science blog and published a case study: Harvard physics professor Matthew Schwartz supervised Claude through a theoretical physics calculation that would normally take a graduate student about a year. It took two weeks, produced 110 drafts and 36 million tokens of work, and resulted in a paper he describes as potentially the most important of his career “not for the physics, but for the method.” He was emphatic that domain expertise remained essential – Claude made enough errors that a non-expert supervisor would have missed critical mistakes. The implication for knowledge workers: AI can now dramatically compress timelines on complex intellectual projects, but it still needs a qualified human in the loop.

Who’s Getting the Most Out of AI (and Why It Matters for You)

Anthropic’s latest Economic Index report tracks how Claude is actually being used across the economy, and the most actionable finding is about experience. Users who have been on the platform for six months or more show a 10% higher success rate in their conversations compared to newer users, even after controlling for what tasks they’re attempting. They also gravitate toward higher-value work and spend less time on personal queries.

The report can’t fully separate “people who were already sophisticated got on the platform early” from “using AI makes you better at using AI.” But either way, the gap is real and growing. If you started using AI tools seriously in the last six months, you’re likely leaving significant capability on the table compared to colleagues who have been iterating longer. The practical move: treat prompt-writing and task decomposition as skills worth deliberate practice, not just intuition.

The broader usage picture shows AI spreading into more everyday tasks (sports scores, product comparisons, home maintenance questions now make up a growing share of activity), while the serious professional use is quietly migrating from consumer chat interfaces into automated workflows. About 49% of jobs have now had at least a quarter of their tasks touched by Claude, a figure that has barely moved in three months, suggesting the initial wave of adoption has saturated and what’s changing is the depth of use rather than the breadth.

A Security Alert Your IT Team May Have Missed

Two versions of LiteLLM, versions 1.82.7 and 1.82.8, were found to contain malicious code that automatically harvested credentials from any system where they were installed. LiteLLM is a widely used open-source library (a software package that developers use to connect applications to multiple AI providers like OpenAI, Anthropic, and Google at once). The malware ran the moment Python started, before any code was executed, and collected API keys, passwords, SSH keys, environment variables, and system information, then sent them to an external server.

This is a supply chain attack: malicious code hidden inside a legitimate, trusted tool. It’s the software equivalent of a compromised component in a product your vendor ships you. If anyone on your engineering or data team uses LiteLLM, confirm they are not on versions 1.82.7 or 1.82.8, rotate any API keys that were present on affected machines, and audit what credentials may have been exposed. The discovery triggered over 900 comments on GitHub and was one of the most-discussed security incidents in the developer community this week.

The broader lesson: AI infrastructure is becoming a target. The tools your teams use to build and run AI applications carry real security risk, and version pinning and package auditing are no longer optional hygiene.

Quick Hits

GitHub training opt-out deadline: April 24. If you have a GitHub account with private repositories and don’t want GitHub using them to train AI models, opt out here before the deadline. This is opt-in by default, meaning inaction means consent. Hacker News discussion
A 400-billion parameter AI model ran on an iPhone 17 Pro. A model that size would have required a server rack two years ago. On-device AI of serious capability is arriving faster than most roadmaps predicted. Source
A court blocked the Pentagon from labeling Anthropic a supply chain risk. The Defense Department had attempted to restrict Anthropic through a national security designation; a federal judge issued an injunction blocking it. The case signals that AI companies are becoming entangled in geopolitical regulatory battles beyond standard commercial oversight.
Sora, OpenAI’s video generation tool, shut down its standalone app. The official account announced the closure this week, with video generation functionality folding into the main ChatGPT product. Consolidation of AI products into unified platforms is accelerating.
The European Parliament voted to end Chat Control 1.0. Starting April 6, major tech platforms including Gmail and LinkedIn must stop automatically scanning private messages in the EU. Relevant if your organization handles European communications and has been uncertain about message privacy obligations.

What to Watch

The “AI as researcher” question is moving from hypothetical to operational. Anthropic’s science blog will publish practical workflows for using AI in research. If your organization does any form of knowledge work (market research, policy analysis, competitive intelligence, scientific R&D), the techniques being developed in academic labs right now will reach you within 12-24 months. Start thinking about what “a qualified human in the loop” means for your domain.
On-device AI will change your assumptions about cloud dependence and data privacy. A 400-billion parameter model on a phone means enterprise AI that never touches an external server is coming. Watch for this to reshape procurement conversations about data residency and vendor lock-in.
The experience gap in AI adoption will become a competitive differentiator. Anthropic’s data shows a measurable skill curve in AI use. Organizations that have been experimenting seriously for a year will have meaningfully more capable teams than those starting now, independent of what tools they use. If you haven’t already, ask your leadership team: who in this organization is building genuine AI fluency, and how are we measuring it?

Building the Knowledge Base: Fixing the First Gaps in Your AI Team

Fri, 27 Mar 2026 00:00:00 +0000

In the last post, you built your team , and now you can start managing your team. Pick the agent that makes the most sense for the task at hand, and work with them on it. When you move on to a new task, open a new session with the appropriate agent for that task.

As you work with your team, you’ll quickly notice that you’re giving them the same facts and details over and over again. They’re good in their roles, but they don’t know anything about your specific job. Your Marketing Strategist doesn’t know what products you’re working on. Your Data Wizard doesn’t know what the metrics mean in the data they’re analyzing. Your Copywriter doesn’t know which value props to highlight in your ad copy. They’re missing information they need to do the work. You’ve run into the first type of gap: a knowledge gap. Now you need to fix it.

To close a knowledge gap, you need to start building a knowledge base that has the type of information a new hire would get during onboarding.

Think of the knowledge base as the wiki you’re building for your AI team. It contains relevant and useful content that’s easy to find whenever they need it. That includes information they’ll refer to all the time, like a style guide, and information they’ll need for specific projects, like a project brief.

In practice, a knowledge base is just a collection of files your agents can access, organized so the right information is easy to find.

No skimming, no stamina

This will be incredibly useful for your team as long as you keep in mind two constraints.

AI agents don’t skim or skip. They have to read everything in order to find anything. It doesn’t matter if what they need is in the first sentence of a document. They still have to read the entire document, which is not ideal.

This is a problem because AI agents have limited mental energy. The more they read, the worse they get. You don’t want to give them a book and ask them to find the three facts they need for a task. You want to give them a one-pager with those three facts.

Pages, not books

Instead of using one file that has everything in it, create a knowledge base that has a lot of files in it. Every file should cover a distinct and unique topic. Some of these will contain information that is more permanent and broadly applicable (role-specific). Others will contain information that is for specific projects or tasks (project-specific). With separate files, your agents can mix and match to get exactly what they need.

For example, I work with my Marketing Strategist across multiple products I support. They always need to know what marketing channels and tactics are available regardless of what product we’re working on. But if we’re working on Product A, they don’t need to know about the value props, target audience, or positioning of Product B.

Signposts, not search bars

When it comes to setting up your knowledge base, the overall principle is that if it would help you find a piece of information, it would also help your agents. This should guide how you organize, name, and store your information.

Imagine if you kept all of your files in one folder, and you had to find a specific file without being able to search for it. That would be a nightmare (or at least really time-consuming). Instead, you probably organize your files across different folders that have descriptive names. If you go into a folder named “Product A,” you know that you are going to find more files and folders in it that are related to that product.

You should do the same for your knowledge base: create a folder hierarchy that makes it easier for an agent to browse through all of the available files. For example, I keep my role-specific and project-specific information in separate folders. Every project gets its own folder where I can keep the project-specific files.

This is a simplified version of my knowledge base folder at work.

~/ai/
├── context/                                    # Persistent reference knowledge — rarely changes
│   ├── products/                               # One folder per product
│   │   ├── product-a/
│   │   │   ├── product-a-product-overview.md
│   │   │   ├── product-a-messaging-framework.md
│   │   │   └── product-a-key-metric-definition.md
│   │   └── product-b/
│   ├── marketing/                              # Domain knowledge (not product-specific)
│   │   ├── channels/
│   │   │   └── marketing-channel-overview.md
│   └── document-examples/                      # Few-shot examples for document generation
│       ├── business-requirements-documents/
│       │   └── product-a-brd.md
│       └── strategy-documents/
│
└── projects/                                   # Active and completed work — organized by product
    ├── product-a/
    │   ├── reports/
    │   │   ├── key-metrics/
    │   │   │   ├── key-metrics-report-2026-01.md
    │   │   │   └── key-metrics-report-2026-02.md
    │   │   └── business-reviews/
    │   └── project-1/                           # Time-scoped project folders
    └── misc/                                    # Cross-product or exploratory work
        └── project-1/

The names of the files in your knowledge base are another important clue your agents can use to determine if they’re relevant. You are less likely to know what’s in a file called “document” than you are with a file called “product-a-overview-and-positioning.” It’s the same for an agent considering whether to read that file.

Once you open a file, you probably don’t want to read all of it to know if it’s useful. A summary at the top is helpful to get the gist of the content and determine if you should continue reading. You can do the same in your knowledge base files with a few lines at the top that tell the agent what this file is about and whether it’s worth reading, like:

---
title: "Project Orion Launch Brief"
product: "Orion Analytics Dashboard"
status: active
date_updated: 2026-03-10
summary: Redesign of the analytics dashboard to support real-time data streaming. Goal is reducing time-to-insight for enterprise customers by 40%. Use this file when working on any Orion-related marketing, messaging, or launch planning tasks.
---

You can see a full example of a knowledge base file at the bottom of this post.

You used LLMs to help you create your persona prompts, and you can also use them to set up your knowledge base. Whether you’re a type-A person who has all of your MBA notes from 15 years ago scanned and searchable, like me, or more of a go-with-the-flow type, this is another place to let an LLM take the first pass. Don’t get too hung up on the details or strive for perfection. That’s why there is a feedback loop in the process: so you can move quickly and make improvements that address real issues.

It’s the same approach as in the last post: you let the LLM take the first pass and then refine as you go. The earlier you build that habit, the faster the whole framework is going to pay off.

Your team is more knowledgeable, but that isn’t enough to guarantee you get consistent, high-quality work. You still need to give them an employee handbook and a set of standard operating procedures to fix the second type of gap: behavioral gaps.

Resource: prompt you can use to design a knowledge base for your AI agents

I'm building a knowledge base for a team of AI agents. The knowledge base is a collection of markdown and CSV files that any agent on the team can access. Not every agent needs every file — they'll pull in what's relevant to each task.

Help me figure out what files I need. Interview me with the following questions, one at a time. Ask each question, wait for my response, then move to the next.

1. What agents did you set up? For each one, briefly describe the role and the kinds of tasks they'll handle.
2. What information comes up repeatedly across your agents' work, regardless of which agent is doing it? Think about context that any of them might need.
3. Are there specific products, projects, or workstreams that your agents support?
4. Are there reference documents or standards that multiple agents would need access to?

After the interview, propose a knowledge base structure:
- A folder layout with descriptive names, separating role-specific files (broadly useful across projects) from project-specific files (tied to a particular product or workstream)
- A list of recommended files, each with a one-line description of what it should contain and whether it's role-specific or project-specific

Based on what you learn about my setup, propose a YAML frontmatter format for my knowledge base files. Every file should have a title, status, date updated, and a short summary describing what the file contains and when an agent should use it. Beyond those basics, add fields that make sense for my situation — for example, a product field if I support multiple products, or a domain field if my agents span different areas of expertise. Explain why you chose the fields you did.

Include the proposed frontmatter in each recommended file.

Keep every file focused on a single topic. Aim for files that are 1-2 pages, not 10. If a topic is too broad for one file, split it.

Example: what a knowledge base file looks like

---
title: "Project Orion Launch Brief"
product: "Orion Analytics Dashboard"
status: active
date_updated: 2026-03-10
summary: Redesign of the analytics dashboard to support real-time data streaming. Goal is reducing time-to-insight for enterprise customers by 40%. Use this file when working on any Orion-related marketing, messaging, or launch planning tasks.
---

## What is Project Orion?

Orion is a redesign of the existing analytics dashboard for enterprise customers. The current dashboard refreshes data every 15 minutes. Orion introduces real-time streaming so customers see their data as it happens.

This is not a new product. It's a major upgrade to an existing product that enterprise customers already use daily.

## Business goal

Reduce time-to-insight for enterprise customers by 40%. The current delay between data generation and dashboard visibility is the #1 support complaint and the #1 reason prospects cite for choosing competitors.

## Target audience

- Primary: existing enterprise customers (upgrade path)
- Secondary: mid-market prospects evaluating analytics platforms for the first time
- Not targeting: SMB or self-serve customers (Orion is enterprise-tier only)

## Key messaging pillars

1. **Real-time, not near-time.** Competitors claim "real-time" but deliver 5-minute delays. Orion streams data in under 10 seconds.
2. **Zero migration effort.** Existing dashboards carry over. No rebuilding, no re-learning.
3. **Built for the analysts, not just the admins.** The redesign focuses on the daily experience of the people who actually use the dashboard, not just the people who set it up.

## Competitive context

- Competitor A offers real-time but requires a full dashboard rebuild on migration
- Competitor B has a faster refresh rate (5 min) but no true streaming
- Our advantage is real-time streaming with zero migration friction

## Launch timeline

- Beta: April 2026 (50 enterprise customers)
- GA: June 2026
- Marketing launch campaign begins two weeks before GA

## What this file does not cover

- Pricing and packaging (see `orion-pricing-and-tiers.md`)
- Technical architecture (see `orion-technical-specs.md`)
- Full competitive analysis (see `competitive-landscape.md`)

Your First AI Hire: Building Agents That Know Their Job

Fri, 20 Mar 2026 00:00:00 +0000

I remember when I realized I was starting to use AI at work as if I were managing a team of AI employees . I got so excited, I immediately sketched the idea on a sheet of paper so I could share it with my teammates.

What started as a sketch is now core to how I use AI agents to do things faster and better at work and at home. It’s an approach that naturally guides you toward the context engineering best practices that improve LLM output.

The reason this approach works is that it uses one of the two available levers to improve how well a best-in-class large language model (LLM) works for you.

Fine-tuning: this is where you take an LLM and train it further using your own data so it becomes more specialized for your needs.
In-context learning: giving the LLM the right expertise (persona), knowledge (context files), workflows (skills), and rules in each session (team rules).

For most people, fine-tuning is going to be out of reach. Even if you could fine-tune a model, you’d have to retrain it repeatedly to keep up with changes in your work. Otherwise, the model would grow stale. In-context learning is how you keep the model relevant between retraining cycles, and for most people, it’s the only lever available.

It all starts with building your team.

Define the roles

The first step is to define the roles for your team by identifying the groups of similar tasks you do over and over again. It might help to start with pen and paper like I did.

List out the things you do at work or at home on your computer. Don’t overthink it; just write them down. Then group the ones that are similar in terms of how you approach them (required behavior) and the information you need to do them (required context).

The groups of items you do most often and that take the most time are the best candidates for roles because they’ll benefit the most from ongoing improvement. On those tasks, you can work with an agent frequently enough to spot gaps that lead to improvements that you’ll continue to benefit from.

As you build your team, keep in mind that the ideal number of direct reports for a manager tends to be 8-9 . This principle also applies to AI agents. The more you have, the more complex it gets to keep up with the feedback and improvement loop for each one. Remember, you’re not building a department. You’re building a team.

In my role as a Sr Product Marketing Manager, I’ve landed on five agents that I work with daily:

I’m setting up a different team at home: an editor, financial advisor, and personal trainer.

Create the personas

Creating the personas will be quicker than you think, because you’re going to use AI to help create them.

Start with the role that you feel the most comfortable defining. Spend a little time thinking about how you’d want the agent in that role to behave. What should it do? What should it never do? Don’t overthink it. You’re not going for perfection. You’re going for something that you can provide an LLM, like ChatGPT or Claude, to help it create a persona prompt for you. Keep it simple so you don’t get hung up on this step. The feedback loop will improve it over time.

Next, start a chat with the best-performing model you have access to. Regardless of what you are using, if you have the option to select a model, select the latest frontier model from that provider. Starting with a better quality model means you’re more likely to start with a good persona prompt. That’s less distance to close with the feedback loop to get to an agent that starts to materially improve the work it was created for.

In the chat, ask it to help you create a persona prompt. Let it know the role you have in mind, the type of work you’re going to use it for, and how you want the agent to behave. I’ve included a prompt at the end of this post that you can copy into Claude, ChatGPT, Gemini, or your tool of choice to walk you through creating your persona prompt.

Review what the model writes for you, and iterate on it as needed. If something doesn’t sound right, let the model know what the issue is and ask it to update the prompt. You don’t need to use any kind of special language to get this done. Treat it like a conversation you’re having with a coworker to improve a document. And remember what I mentioned before: there’s no need to be precious about this. This is a starting point that you’re going to refine through the feedback loop.

A good prompt is going to define the agent’s identity briefly (1-2 sentences) and focus primarily on behavioral guidance for the agent. This includes how to approach tasks, standards to enforce, and what to prioritize. It’s also helpful to include specific things the agent shouldn’t do in this type of role. For example, I don’t want my data wizard to ignore a sudden spike or decrease in a metric, because I’ve learned that generally doesn’t happen without some external factor causing it.

After you create your persona prompts, take a step back and think about how you created them. You delegated the persona draft to an LLM. That’s not a shortcut. You’re not cheating. That’s the whole point of creating your AI team. You’re going to be delegating more and more work to them, and this is the first point in the framework where you do that.

As you build trust with your agents, you’re going to start to delegate more to them: bigger tasks, more autonomy, more trust. This is exactly what it’s like to be a manager when you’re working with a new employee. You’re initially close to what they’re doing, you build trust, and then you start to give them the room to run. That’s when you start to really see the benefits of adding that employee to your team. It’s the same thing here. The earlier you get comfortable delegating work to the AI agents, the faster everything in the framework will start to pay off.

Set up the agents

The last thing you need to do to build your team is set up the agents by loading the persona prompt into whatever tool you’re using. The specifics are going to vary based on the tool you’re using, e.g. Claude Code versus Kiro CLI. I’ll cover how to do this in more detail in an upcoming post in this series. For now, you just need to remember that the persona prompt is the foundation for each agent on your team.

Building your AI team is straightforward. You’re the expert at what you do and how to do it well. Use your experience and expertise to guide an LLM to build persona prompts for AI agents to fill your open roles. That gets them hired. The knowledge base you’ll create is what gets them up to speed and delivering high-quality work for you.

Resource: prompt you can use with an LLM to help create your persona prompts

I need you to help me write a persona prompt — a set of instructions that will shape how an AI agent behaves every time it runs. Think of it as a job description the AI reads before every conversation.

Before writing anything, interview me. Ask these three questions one at a time, waiting for my response before moving on:

1. **What role does this agent play?** What's the domain and who does it serve? (If you know what platform or tools the agent will use, mention them — but don't worry if you're not sure.)
2. **What kinds of work will it do?** Describe the typical tasks or situations the agent will help with. Think about what a good day looks like — what does the agent do well?
3. **What behaviors matter most?** How should the agent approach its work? What should it do when it's unsure? Are there things it should always or never do?

After the interview, generate the persona prompt. Use what I told you as the foundation, but add your own recommendations — behaviors or guidelines that would make this agent more effective for the role, even if I didn't mention them. Call out anything you added so I can review it.

Follow these rules when writing the prompt:

### Focus on behaviors
- Describe what the agent should *do*, not what it *is*. "Start by understanding the full situation before proposing solutions" is a behavior the agent can act on. "You are thorough and thoughtful" is not — it's a personality trait, and the agent won't know how to translate that into action.
- Frame instructions as conditional guidance: "When X, do Y." This gives the agent concrete decision points rather than abstract qualities to live up to.
- If a behavior only applies sometimes, state the condition.

### Hit the right altitude
- Write at the level of a clear team lead briefing a competent new hire — not a legal contract, not a vague mission statement.
- Be specific enough to prevent the mistakes that actually happen, but flexible enough to let the agent use judgment in novel situations.
- Prefer "when X, prefer Y" over rigid step-by-step procedures. The agent needs guidance it can apply across situations, not a script that breaks the moment something unexpected comes up.

### Structure for the role
- Let the role dictate the structure. A coding agent needs different sections than a research agent or a writing coach. Don't force a template.
- Always lead with identity and scope — one or two sentences that establish who this agent is and what it does.
- After that, organize the remaining instructions into whatever sections make sense for this specific role. Use headers and bullets so the instructions are easy to scan.

### Keep it short
- A persona prompt competes with the user's actual questions and content for the agent's attention. The longer the prompt, the less room the agent has to focus on the real work.
- Aim for the shortest prompt that fully captures the desired behavior. If a line doesn't change how the agent acts, cut it.
- Leave out anything the agent would already know or can figure out from context.
- When in doubt, leave it out. A lean starting point that the user can build on is far more useful than a bloated prompt full of rules that haven't been tested. The user will discover what's missing by working with the agent and can add rules as needed.

### Output format
- Output only the final persona prompt, ready to use.
- After the prompt, add a short "Additions" section listing anything you added beyond what I described, with a one-line rationale for each. This section is for my review — it's not part of the persona prompt itself.

How I Manage a Team of AI Agents at Work

Mon, 16 Mar 2026 00:00:00 +0000

I used to think of AI as a tool I used. Now I think of it as a team I manage. This perspective evolved gradually as I used it daily and found myself rewriting the same persona prompts over and over again for the same types of tasks. I started systematically improving what I was doing until I found myself managing a team, which happened to be made up of AI agents, at the end of the 7-month journey that made me an AI enthusiast .

When you are managing a team, you have to scope the roles for your team and fill them with people who can be successful in those roles. To do that, you hire folks with the right backgrounds and experience, both of which inform how they’ll do the work. Every member of my AI team has a persona prompt with specialized behavioral guidelines I want for the role they’re filling. For example, my Data Wizard prompt has guidance around digging into irregularities in data, like when a metric suddenly spikes up or down.

You want to ensure your team has the information they need to do their work, like a wiki with product information, target audience insights, document templates, and standard operating procedures. You’re giving them the context they lack when they step into that role, and I do the same with a structured knowledge base, project-specific context files, and reusable skills that describe how to do specific tasks.

Lastly, you want to develop your team with feedback and guidance tailored to them. I’m using feedback loops to capture issues and improve their work via the persona prompts, team rules, skills, and knowledge base.

I built my team through trial and error, but I now have a framework for how to do this, which I’m breaking down into two phases: build your team and manage your team. Each phase contains three steps:

Build your team

Define the roles
Create the personas
Set up the agents

Manage your team

Work with your team
Spot the gaps
Apply fixes at the right level

In order to build the right team, you need to figure out what roles you need to fill. Start by identifying similar types of tasks that you have to do often in your role; those groups of tasks represent job openings that you could fill with an AI agent.

I’ve found that my most useful agents are the ones that I work with often, since that supports the ongoing cycle of improvement, so I try to avoid creating a custom agent with too narrow a scope that I won’t work with often. I also don’t want to have to juggle a team of 30 agents every day. I think the best practice of keeping a manager’s span of control to ~8 or fewer employees also makes sense in this context.

You’ll then build a persona prompt for each job opening that defines the ideal candidate’s identity and how they work. Once you have the persona, that serves as the foundation for the AI agent you’ll set up in a tool like Kiro or Claude Code to be on your team.

To manage your team, you have to understand their strengths and weaknesses, and that means working closely with them. You’ll want to work with the right agent for the task at hand.

As you work with them, you’ll start noticing recurring gaps that you need to address to improve your team. Some of those will be knowledge gaps, where the team needs more information, and others will be behavioral gaps, where your agents aren’t doing something the way you’d like them to or expect them to.

Based on the gap, you’ll want to address the situation at the right level. That might entail adding a new context file to the team’s knowledge base or updating an agent’s persona. These small tweaks will start to lead to big improvements, but this isn’t a set it and forget it kind of deal. It’s a continuous management process that never really ends.

I’ve noticed that my team is producing better work in less time with this approach, but I don’t have an objective way to measure or validate that. I want to learn more about LLM evaluation techniques so I can get to objective measurement, but in the meantime, I have some validation from others at work.

First, I’ve started to get compliments from copywriters on the draft marketing copy I’m writing with my Copywriter AI agent. Second, I shared the first draft of a monthly business review document my team wrote with my counterpart on the product side. I let her know it was all AI-generated (the analysis and the write-up), and I asked her to review and check if there was potential there. She was so impressed with the quality of the MBR, she started asking me questions about how I’d put it together and what my setup was. Lastly, I was able to write a good business requirements document from scratch in one day because the infrastructure was already in place.

Building and managing an AI team is much harder than just using a chatbot or the default agent that you get with something like Kiro or Claude Code. It takes upfront work to scope the roles, build the personas, and create the infrastructure to support the ongoing improvement.

A lot of that work will happen before you start to see the results, but it will start to compound. Pieces will build on top of other pieces, and things will get faster, both because you’ll start to optimize the process to your work style and because your agents will get better. I don’t have the data yet to prove this is better, but I’ve seen enough to think the effort is worth it. So I’m currently setting up the same approach at home with Claude Code.

That’s a high-level overview of my managed AI framework. I’m going to dive deep into each area of the framework with separate posts on building your team, setting up the knowledge base, working with skills and team rules to set expectations and requirements, and creating a feedback loop to drive the ongoing improvement. I’ll then cover how I’ve implemented this in Kiro CLI at work and Claude Code at home. I’ll end the series with a post on the learnings and best practices I’ve picked up along the way. By the end, you should have a good roadmap with explicit examples to allow you to set up your own team.

I want to develop my skills, deliver better results, and spend more time with my family. This framework is how I’m doing that. It’s a tool-agnostic approach that can help move you away from using one-size-fits-all tools to building an AI team that’s tailored to your needs and able to deliver better results for you.

I've Been AI-Pilled: My Journey From Chatbots to Custom Agents

Fri, 13 Mar 2026 00:00:00 +0000

I was slow to start using generative AI, but over the last 7 months, AI has fundamentally changed how I work. I’ve gone from occasionally using AI to write text, to using it to create Python scripts, to now having a team of five custom AI agents that I collaborate with daily. I’m seeing how quickly the benefits are compounding, and as a result, I’ve been AI-pilled.

I began learning about LLM-based gen AI in earnest in 2024. I read all the most popular books at the time, but my exposure remained primarily theoretical. I learned how LLMs work fundamentally, but the biggest practical takeaway was the idea of assigning a persona to chatbots to improve their output. That’s basic prompt engineering, e.g. “You are a copywriter with 15+ years of experience in consumer tech. Help me write a marketing email about this product.” On the rare occasion I used a chatbot, I always remembered to assign it a persona.

Last August, I joined a project at work that was the turning point for my AI enthusiasm. In that project, I had to manually build a large JSON file that would require a lot of ongoing updates. On a whim, I decided to see if I could use a chatbot to write a Python script to go from JSON to Excel and vice versa. That would allow me to make updates in Excel, which would be faster, and then generate the JSON programmatically, reducing the risk of errors. Within 30 minutes, I had a working prototype that ultimately saved me countless hours over the coming months.

I’m OK at Python, but I realized LLMs are much better. So, I began to write a lot of Python scripts this way to automate repetitive or time-consuming tasks, like resizing images or creating Word docs from copy I had in Excel files, in order to stay on top of the workload for the project.

I was soon using chatbots weekly for other things. I began paying attention to what model the chatbot was using, switching to the latest frontier models whenever possible. That had a noticeable impact on the quality of the copy and ideas that I was getting from the chatbots. Especially when I paired better models with a well-crafted persona and a collaborative approach.

I got tired of typing different versions of the same persona prompts whenever I started a new chat. Often I was too busy and moving too quickly to write something better than “You’re a [blank] with X years of experience.” It happened enough times that I realized I could save some time without sacrificing quality by creating reusable persona prompts for different types of tasks. At first I wrote them myself, and they were ok, but not great. By asking the chatbot to help me craft the persona, I was able to take them to the next level. I kept the prompts in Word docs so I could copy and paste them into the start of my chat sessions depending on what I was working on. To save a little more time, I’d pin the chat in the sidebar and rename it to something like “Copywriter” or “Data Wizard,” so I could quickly return to the right chat based on what I was working on.

I’d work with the same chat for up to a week because I wasn’t aware of context rot (where long, ongoing conversations with LLMs start to produce worse results). That’s ok though, because it led to another breakthrough for me. I started to ask the chatbots at the end of each week how we could improve the persona prompt I’d initially started the conversation with based on our interactions. The chatbot would suggest some ideas, and after a few revisions and back-and-forths, it would write an updated version that I would use in the next week’s chat. That created a feedback loop to improve my personas on an ongoing basis.

For example, I learned that LLMs guesstimate how long copy is after noticing that their character counts for marketing copy were often wrong. That’s not great when you’re writing ad copy that has specific character constraints. I updated my Copywriter persona prompt with instructions to count each individual character when writing against copy constraints. After that, I no longer had to worry about getting copy options that were too long for the character constraints I’d provided. It was like giving an employee feedback, except the chatbot immediately incorporated that feedback into how it worked.

That’s the core idea in the framework that is now guiding my AI usage: that I’m the manager of a team of AI agents. They’re incredibly smart, but also kind of dumb. They have a lot of expertise, but they’re also clueless about the specifics of where I work and what I’m working on.

The more effort I put into developing my team and providing what they need, the better the quality of work I get from them. And the benefits are compounding over time. I spend less time correcting simple issues and more time refining and improving what we’re working on. I spend less time providing the same context over and over again to my agents. Instead, they have a growing knowledge base to inform their work. I spend less time tweaking the documents they create because they have actual examples to refer to of the various types of documents I have to write. I’m capturing feedback and improving every aspect of my setup daily, and that makes it even better the next day and miles ahead of using a run-of-the-mill chatbot.

I’m going to go into more detail about this framework in coming posts and explain how I’ve implemented it in Kiro CLI , an AI coding tool I use at work primarily for non-coding tasks, and how I’m now implementing it in Claude Code at home.