May 6, 2026 · 10 min read

How AI Agents Coordinate And How They Remember

Two questions decide whether an AI agent project ships or stalls: how the agents coordinate, and how they remember. A small-business and local-government playbook.

Two questions decide whether an AI agent project ships or quietly stalls. Not "which model" and not "which vendor." Those are the louder questions, and they get answered first because the answers are easy. The two harder questions are these: when more than one agent is involved, how do they coordinate? And after the work is done, how does each agent remember what it learned? Pick badly on either and you end up with the kind of demo that works on a clean prompt and falls apart in the second week.

The choices get treated as plumbing. They aren't. They shape what you can build, what it costs, who has to babysit it, and whether the same agent gets smarter over a quarter or restarts from zero every Monday morning.

This post walks both decisions, three patterns each, in the same shape, so you can hold them up against your own situation. The three-and-three structure borrows from two essays I recommend reading first if you want the longer treatment; both are linked at the end. The concrete examples here, especially the small-business and local-government scenarios, are mine.

Pattern set 1: how the agents coordinate

Pattern A: one supervisor, several specialists

A single supervisor agent receives the request, decides which specialist handles it, calls that specialist, reads the result, and either replies or calls the next one. This is the classic shape and the cheapest pattern to reason about. You always know who is holding the request. Logs read like one transcript per task, not a multi-party group chat. When something goes wrong, the supervisor's prompt is where you go to fix it. OpenAI's Agents SDK ships this pattern as the default, and Anthropic's Claude Agent SDK supports it through subagents that the parent can spawn with their own context.

The tradeoff: the supervisor becomes a bottleneck for every step, and its context window has to hold the whole task. Long workflows blow that window. Splitting work into stages helps, but each stage starts with a re-summarized brief, which is exactly where information loss happens.

Pattern B: peer agents that hand off work

Several agents share the work directly. One reads the incoming email, hands the parsed message to another that decides what category it is, which hands it to a third that drafts the reply. CrewAI is the most-cited framework that ships peer-style coordination as the primary pattern; LangGraph supports it through graph-based handoffs. The advantages are parallelism and specialization. When three customer messages arrive at once, three drafting agents can work at the same time. Each agent's prompt also stays short because it only does one job.

The tradeoff: every handoff is a place state can drop. If the email-reader extracts the customer's name and the categorizer doesn't pass it through, the drafter writes "Hi there." Peer patterns reward strict shared schemas (everyone reads and writes the same shape) and punish improvisation.

Pattern C: vendor-managed handoffs

A growing list of platforms handle the orchestration for you. You define the agents and the tools they can call; the runtime takes care of checkpointing, retries, and routing between them. Anthropic, OpenAI, Vercel's AI SDK, and several specialized vendors offer their own version. The pattern is the youngest of the three and changes most often, so anything I write about specific guarantees will be out of date quickly. Check current docs before committing.

The tradeoff is honest: you trade flexibility for not having to maintain orchestrator code yourself. If your team ships agent projects every quarter, that trade is good. If you're shipping one agent system and treating it as infrastructure for the next ten years, owning the orchestrator may matter more.

Pattern set 2: how the agent remembers

Pattern A: re-fetch every time (RAG)

Retrieval-augmented generation is the pattern almost every "talk to your docs" demo uses. Documents get chunked, each chunk gets a vector embedding, and at query time the agent searches its vector store for the most similar chunks, drops them into the prompt, and answers from that context. The agent remembers nothing between sessions. It just reads the same library every time. The original RAG paper from 2020 (Lewis et al.) framed it that way and the framing has held.

RAG works well when the source material is large, changes often, and the agent's job is to ground answers in it rather than build new knowledge from it. Where it strains: anything that needs to compound. Two months in, a RAG-only agent answers a question the same way it did on day one, even though it has now seen that question fifty times and a human would have written down the canonical answer.

Pattern B: a growing wiki the agent maintains

In this pattern the agent does not just retrieve, it writes. After each task it distills what it learned into structured notes (a glossary entry, an updated runbook, a corrected fact) and links the new note into a graph of existing notes. Andrej Karpathy described this shape on X in early 2026 as a model "wiki" assembled from many sources, and a small open-source community has been building toward it since. The agent's knowledge becomes a living document, not a static index.

The tradeoff is operational. Someone (or some process) has to keep the wiki sane. Without quality gates, wiki entries drift, contradict each other, and the agent starts citing its own old mistakes as fact. This pattern pays off when paired with a human reviewer or a second agent whose only job is to vet new entries.

Pattern C: skills and scheduled jobs

The third pattern goes further still. The agent's memory isn't just notes; it's a library of named skills it can invoke, plus a calendar of jobs it runs on its own. Garry Tan described his personal version of this, "Gbrain," as a working brain spanning two dozen skills and twenty-something cron jobs across thousands of pages of accumulated context. The shape generalizes: instead of a wiki the agent reads, the agent has procedures it executes. A "summarize-yesterday-and-flag-anything-weird" job at 6am. A "draft-a-vendor-reply" skill it calls when an invoice email arrives.

This pattern is the most capable and the most dangerous, in that order. Capable, because the agent acts without prompting. Dangerous, because if a skill is wrong, it does the wrong thing fifty times before anyone notices. The pattern requires real observability: every scheduled run logged, every skill invocation reviewable, and a kill switch that doesn't depend on the agent agreeing.

What this looks like at a small business

The combinations matter more than the patterns in isolation. Three illustrative scenarios:

An auto repair shop. Customer email arrives across three addresses (the website form, Google Business, and the personal address one of the techs uses for regulars). The shop wires up an orchestrator with three specialist agents: one reads and categorizes, one looks up the customer's car history in the shop-management system, one drafts a reply. Coordination: supervisor with specialists. Memory: RAG against the invoice history plus a small wiki the lead tech maintains for "things we've learned about this car model" notes. The wiki is the part that grows. Six months in, the agent reminds the tech that this make of vehicle has had three of the same brake-light failures that turned out to be a wiring harness, not the bulb.

A solo law firm. Intake forms arrive from the firm's website. The lawyer wants the form parsed into the matter-management system, a conflict check run against the existing client list, and a first-draft engagement letter ready when she sits down to review. Coordination: peer agents passing a structured intake record, because parallelism matters when several intakes arrive on the same morning. Memory: a wiki of the firm's house-style language and a small set of skills the agent has practiced on, like "draft a flat-fee engagement letter for an estate-planning matter in this state." RAG sits underneath as the boring layer that pulls in the most recent statutes and rules.

An independent retailer. Vendor invoices arrive in PDF; the bookkeeper has been keying them into the ledger by hand for years. The retailer wants invoices read, matched against purchase orders, and any mismatch routed to her for human review with the discrepancy already written up. Coordination: supervisor with specialists, because the steps are sequential and the supervisor's logs are what the bookkeeper needs to trust the system. Memory: skills (the agent has a "match line items" skill, a "flag a price discrepancy" skill, a "write the email to the vendor" skill) backed by a wiki of vendor quirks like "this vendor's invoices use a different SKU format than their packing slips."

What this looks like at a local government

Local governments are not small businesses with a different logo. They have stricter records requirements, more public scrutiny, and far less tolerance for an agent that does the wrong thing. Three illustrative scenarios:

A city public-works department triaging 311 requests. Residents file requests through a portal, the phone, and increasingly through social media DMs. A supervisor agent receives the request and routes it to a specialist that classifies it (pothole, streetlight, illegal dumping) and a separate specialist that estimates a service-level commitment based on the request type and the department's historical response times. Coordination: supervisor with specialists, because the audit trail has to be one transcript per ticket. Memory: RAG against the city's published service-level policy documents, plus a wiki the department maintains of "which contractor handles which neighborhood for which request type" because that data lives in three different spreadsheets right now. No autonomous skills, because acting on a 311 ticket means dispatching a crew.

A county clerk responding to public records requests. Public records requests are bursty and recurring; the same documents get requested again and again. The clerk's office wires up an agent that receives a request, checks whether the responsive records have been produced before, and either points the requester at the prior response or starts the routing for a new fulfillment. Coordination: peer agents, with one handling intake validation and one handling the response lookup, because the two streams can run independently. Memory: a heavily-curated wiki of prior responses by request type, with redaction notes attached. RAG underneath against the live records system. No autonomous skills here either; every response goes through the clerk before it leaves.

A small school district handling parent communications. Parents send messages across email, the parent portal, and three different messaging systems the district has accumulated over the last decade. The district wants every message classified (transportation, lunch, IEP, attendance, other) and routed to the right office, with the parent's preferred contact channel attached. Coordination: supervisor with specialists. Memory: a small wiki of "this family communicates in Spanish on the weekends and English during the week" notes that the front-office staff have been holding in their heads. The wiki replaces institutional knowledge that walks out the door when a long-tenured admin retires.

Picking, in plain language

A short, opinionated guide:

One user, one task type, a clean source of truth (a website, a manual): start with supervisor + RAG. Cheapest to reason about; cheapest to operate.
Recurring patterns ("the same kinds of questions every day, the same kinds of fixes"): keep the supervisor, add a wiki the agent writes to. The orchestration doesn't have to change for the memory to.
Parallel streams (multiple intakes at once, multiple inboxes) and the work fits a tidy schema: move to peer agents. Keep memory simple at first; the peer pattern's failure modes show up in coordination, not memory.
Work that should keep happening when nobody asks it to (overnight reports, weekly reconciliations, an hourly check): you've earned skills and scheduled jobs. Earn it; don't start there. The observability cost is real.

The one combination to avoid before you've shipped a simpler version: peer agents plus autonomous skills plus scheduled jobs all at once. That configuration is where weird state becomes hard to trace and the agent does five wrong things before anyone notices.

Closing

Both architecture decisions get easier once you've sat with them once. Most agent projects pick a pattern by accident, by following whatever the example notebook in the framework's documentation happened to use, and that's how you end up with a peer-agent system trying to do work that wanted a supervisor, or a RAG-only memory layer for a job that needed a wiki. The point isn't to declare a winner. It's to make the choice explicit before the codebase locks it in.

If you're a small-business owner trying to put one of these together without an agency-sized budget, The $20 Dollar Agency walks the AI-tool stack you can actually run yourself, and the same architecture choices apply whether you're a repair shop or a one-person law firm.

Fact-check notes and sources

Three-orchestration-pattern framing adapted from "OpenAI Symphony vs Claude Managed Agents vs CrewAI: Which Agent Orchestration Pattern Wins?" on AI Advances, 2026: https://ai.gopubby.com/openai-symphony-vs-claude-managed-agents-vs-crewai-which-agent-orchestration-pattern-wins-43141fd7b944
Three-memory-pattern framing adapted from "RAG, LLM-Wiki or Gbrain: How Your Agent Remembers Changes Everything" on AI Advances, 2026: https://ai.gopubby.com/rag-llm-wiki-or-gbrain-how-your-agent-remembers-changes-everything-56829e66725c
Original RAG paper: Lewis et al., "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks," NeurIPS 2020. arxiv.org/abs/2005.11401
Anthropic Claude Agent SDK: https://docs.claude.com/en/docs/claude-code/sdk
OpenAI Agents SDK: https://github.com/openai/openai-agents-python
CrewAI framework: https://www.crewai.com and https://github.com/crewAIInc/crewAI
LangGraph: https://langchain-ai.github.io/langgraph/
"Gbrain" was originally framed by Garry Tan in a public 2026 post; "wiki-shaped knowledge" was framed by Andrej Karpathy in a 2026 X post. The chain of attribution for both runs through the AI Advances articles linked above; please consult those primaries before quoting specific numbers.

The small-business and local-government examples in this post are illustrative scenarios used for explanation. They are not case studies of specific organizations, and no real-world performance metrics are claimed.