The Bottleneck Is the Handoff, Not the Model

This Week's Focus ⤵️

Good morning. We are nearly midway through 2026.

It’s been almost five weeks since the last edition. The gap was not lack of material. Federal AI RFIs have been moving fast, and we shipped a meaningful Knowledge Spaces release that tightened our continuous integration (CI/CD) loop for model behavior, synthetic tests, configuration, and customer-specific context.

That work points to today’s topic: memory, context, and handoff.

The next bottleneck in enterprise AI is not access to models. It’s the gap between what humans know, how that knowledge gets structured, and whether multiple people and multiple agents can use it without starting from zero.

Contrary to the vibe-coding narrative, these systems do not run themselves. You cannot tell Claude Code to do everything and assume the work will converge. Just last night I had a Claude agent drift so far off the rails that I called it a night and put myself and all the agents to bed.

The real work is not “AI replaces the team.”

It’s “AI changes how the team thinks, documents, reviews, and hands work off.”

Memory. Context. Handoff.

Old becomes new and the timeless classics endure. Memory and context are the new operating layer.

The question I keep hearing from executives, founders, and program leads is no longer “which model should we use?”

It’s: “how do we make AI work repeatably across people, tools, and workflows?”

That’s a memory and handoff problem. The best AI systems I’m seeing are not one-off prompts. They are operating systems for context: multiple humans, multiple agents, shared memory, clear handoffs, and review loops tight enough to catch plausible wrong answers before they become business decisions.

The thesis is simple: as models get better, bad answers get harder to spot.

Brain-Sync

Diving Deeper 🤿

The wrong answer now reads like the right answer. If you’re not a domain expert, you won’t catch it. If you are a domain expert but you’re not paying close attention, you still might not catch it.

I’ve been watching this gap widen for about eighteen months, and the pattern has stayed consistent. The people pulling real value out of AI share three traits.

They think clearly about the problem before they touch a tool.
They write context down in a structured, reusable form.
They orchestrate AI agents the way a senior operator orchestrates a team, with goals, constraints, handoffs, and review.

The most power right now is going to systems-oriented thinkers who understand how LLMs behave and how to push for accurate output. Not the loudest people in the room. Not the people with the biggest tool budgets. The ones who think in systems.

A year ago everyone was talking about hallucinations. The models would make things up, you could spot it, correct it, and move on. The models have gotten meaningfully better since then, and that is where most coverage stops.

The part that matters is that hallucinations are now harder to detect. Outputs are more plausible, more fluent, better structured, and better cited.

This is exactly when domain expertise gets more valuable, not less. AI does not replace judgment. It raises the cost of weak judgment.

Some people call what everyone else is producing “confident slop.” The phrase fits.

Plausible, well-formatted, and wrong in ways nobody is checking.

Core Principles in Practice 💡

A few principles I would put in front of any team trying to scale AI work past the demo phase.

Treat Your AI Context as a Brain, Not a Chat History

A chat history starts cold. A brain accumulates.

Every serious project should load from and write back to a persistent context store, not get re-explained from scratch in a new conversation. The context store is the asset. The chats are the work surface.

Read More, Not Less

AI is not a substitute for reading. It is an amplifier of whatever foundation you already have.

The people getting the most out of AI today read constantly, in their domain and outside it. If your reading habit collapses because you are using AI, your judgment collapses with it.

Pick the Operating Model Before You Pick the Model

Who is the conductor?

What shared brain are they conducting from?

How does that brain stay in sync as more people, more data, and more agents come online?

Solve the operating model first. The model questions will keep changing in the background.

Test Models Against Your Actual Workload, Not Against Benchmarks

Public benchmarks tell you what other people’s tasks look like. Your tasks are not other people’s tasks.

Run synthetic data tests across the model families you have access to. Score them against the outputs you actually need. Make your choices based on evidence from your own workflows.

Plan for Two-Brain Workflows from Day One

Single-user copilots are the starter pattern. The real work in any organization is multi-person.

If your AI architecture cannot accommodate two humans plus AI in a single workflow, it will not survive contact with how teams actually operate.

Engineer for the Multi-LLM Reality

Nobody serious is running on one foundational model family alone.

Different models are better at different jobs. Some workloads cannot leave your environment and need to run on an open-source model or a homegrown derivative. A year from now the leaderboard will look different again.

Build the integration layer so a model swap takes hours (maybe only minutes), not weeks or months.

The frontier labs will keep moving. Your context, your operating model, and your control layer are what stay constant.

AI Trends & News 📰

Musk v. OpenAI Enters Week Three

The California trial is now in its third week, with Sam Altman expected to testify Tuesday and Wednesday. Reuters also reported that Ilya Sutskever testified Monday that he spent about a year gathering evidence for OpenAI’s board that Altman had displayed a “consistent pattern of lying.”

Musk Says xAI Partly Used OpenAI Models to Train Grok

A related wrinkle from the trial: Musk testified that xAI had “partly” used OpenAI models through distillation to train Grok. Whether you see that as standard industry behavior, legal gray area, or competitive contradiction, the takeaway is the same. The model layer is full of strategic, legal, and trust dependencies that enterprise buyers cannot ignore.

Anthropic Signs SpaceX Compute Deal

Anthropic announced an agreement with SpaceX to use all compute capacity at Colossus 1, giving it more than 300 megawatts of new capacity and over 220,000 NVIDIA GPUs within the month. The operational takeaway is simple: even frontier labs are designing around capacity constraints, vendor concentration, and infrastructure optionality.

Anthropic and the Pentagon Remain on Separate Tracks

A federal judge temporarily blocked the Pentagon’s supply-chain-risk designation against Anthropic. Weeks later, Reuters reported that the Pentagon reached agreements with seven AI companies to deploy advanced capabilities on classified Defense Department networks, with Anthropic excluded.

The enterprise lesson is straightforward: policy, procurement, and model governance are now inseparable from AI architecture.

If this much can change at the foundation layer in two weeks, the architectural lesson is obvious: do not let one model, one vendor, or one policy regime become your operating system.

Legacy Spotlight 🔧

Most enterprises do not have a context problem because they lack data.

They have a context problem because their best context lives in their people, not their systems.

Decades of tribal knowledge, undocumented decisions, half-finished playbooks, and institutional memory sit in the heads of employees who have been there ten or twenty years. None of that is in a document the AI can read. It is in the brain.

That is the part of legacy modernization nobody puts on the slide.

You can give an AI agent every PDF, every Confluence page, every Slack history, and every codebase, and you will still be missing a large share of the operating context that lives in human heads.

The work I see paying off in legacy environments is exactly this: sit a domain expert next to an interviewer, extract what they actually do, why they do it, what they avoid and why, and write it into a structured context store the AI can use.

Then watch which questions the AI gets wrong, and go back to the human. Iterate.

The teams that do this for six months end up with a context base more valuable than any LLM upgrade.

We are not only extracting from Oracle, IBM, or SAP, although sometimes we are. We are capturing operating knowledge from people.

Treat the human brain as the unindexed primary source it actually is. Then build the integration layer between that brain and the AI.

Closer to Alignment 🧭

If you’re an enterprise leader, the alignment question right now is not “which AI vendor do we standardize on?”

It is: “who in this organization is going to be the conductor, and what is the brain they conduct from?”

My recommendation: start with your strongest systems thinkers.

The people who naturally write things down. The people who map dependencies. The people who already operate as one-person process improvement engines.

Give them the tools. Give them the time to build the brain. Let them become the public example for the rest of the organization.

Top-down mandates to “use AI more” are noise.

Watching one of your best people quietly outperform the room because their AI brain is wired correctly is signal.

The rest of the organization will follow signal.

From Personal to Enterprise Architecture 🏗️

It’s one thing to make this work in your own personal setup. It is harder than it sounds, and it took me weeks of iteration to get right. In all likelihood, it never really stops.

It’s much harder to make it work inside a business or agency where multiple people, multiple data sources, multiple AI vendors, and multiple risk profiles all have to coexist in the same system.

That second problem is what we’re solving with Knowledge Spaces.

Knowledge Spaces is the control layer between your organization’s working memory and the models you use. It manages context, agent handoffs, model routing, audit trails, and deployment constraints so teams can run different models for different workloads without rebuilding the operating model every time the leaderboard changes.

The goal is simple: your context stays stable while the models, vendors, and deployment environments keep changing.

I’ll make this concrete with my own practice.

I call my AI context “the brain.” Obviously, right.

The brain is a structured, continuously updated context store that every agent I run reads from and writes back to. RFIs, proposals, client projects, external systems, internal systems, product work, partners, and operations all live there.

When I sit down to drive an agent, the agent should not be starting from zero. It should already know the relevant state of the work. All the foundational model companies provide this through projects or similarly named features, but I like to use all the models and I like to run systems both locally and in the cloud, usually in parallel.

My job is to conduct: set objectives, review outputs, catch drift, and keep the system aligned with reality.

Without that shared memory, every session starts cold, every agent is a stranger, and every output has to be rebuilt from human memory.

That does not scale.

Terminal is still home base. From the brain to the systems. Fast and fluid.

A typical morning for me looks like this.

Brain to Terminal is bae.

I have a Terminal window open with about five tabs, each running a different agent. One is doing proposal research. One is deep in a Knowledge Spaces feature branch code review. One is grinding through a long document review for a client. Several have sub-agents running underneath them, fanning out searches across hundreds of files and coming back with synthesis I would never have time to do myself. One is running Caffeine so my local system never goes to sleep.

I’m holding the conductor’s stick. They’re doing the work. I review it regularly.

But the real limit is still what I can manage in my own brain: context, objectives, priorities, and review. This might be what it feels like to graduate from senior AI operator to senior AI executive. The job becomes systems, policies, procedures, and pruning.

It took me a while to get the context and memory architecture right. The hard parts are not the parts most people talk about.

Token-window limits. Synchronization between agents running in parallel. Stale views of shared context. Handoff between sessions when a long-running task has to resume on a different machine, in the cloud, or after a shutdown. Avoiding the constant tax of restating rules and conventions every time a new chat starts. Keeping rules from drifting across surfaces.

None of this is solved by picking a better model.

All of it is solved by treating the operating model as a real engineering problem and the context store as production infrastructure.

When I notice the brain has drifted, or when an agent gives me an answer that is out of date, that is a signal that the integration between my real brain and the AI brain has slipped.

I fix it before I do anything else.

If your team is trying to move from individual copilots to multi-person, multi-agent workflows, reply to this email. I read every response.

Cross-agent handoff complete. Time for coffee. ☕️

Weekly Spotify Mix 🎧

This week’s playlist is called Conducting Complexity.

Some of these songs I like more than others, but I wanted to mix it up from the usual country, Latin, and pop vibes.

I’m probably going back to my favorites next week.

🎺 Note: Web Edition Only

What did you think of this week's edition?

Help us shape topics