Chapter 3Architectural forces in agentic design
Every design pattern in this book, and most of the patterns in canonical sources such as Gulli (2025), Anthropic’s Building Effective Agents, and the CSIRO catalog, exists to balance a competing pair of structural forces. The patterns are not arbitrary, they are responses to tensions that arise whenever a probabilistic reasoning component sits on the control-flow path of a system that has to be reliable, auditable, and economical to run.
This chapter names those forces. The list is short by intent. A small number of well-named forces is more useful than a long catalog of dimensions, because architects reason about systems pair-wise: in any given design decision, two of these forces dominate and the rest fall out as consequences.
These seven are a working vocabulary, not an orthogonal basis, and it is worth saying so plainly. Some overlap. Exploration versus cost and deliberation versus latency are close to the same tradeoff in two currencies, money and time, kept separate here because which currency binds depends on whether the system is batch or interactive. Autonomy versus control and generality versus governability both pull between a system’s reach and the ease of constraining it. The point of the list is the clarity it buys, not a clean decomposition: naming the dominant force pair for a decision surfaces the tradeoff being made, even when a third force is also in play.
The forces named here function the way forces function in Christopher Alexander’s A Pattern Language and in Gamma et al.’s Design Patterns. They are not solutions; they are the structural tensions a pattern resolves. A pattern is a deliberate balance among forces. Reading later chapters with the forces in mind clarifies what each pattern is giving up to obtain what it provides.
Each is treated below. The chapter closes with a short note on how forces compose, and how to read the rest of the book as a sequence of deliberate balances among them.
Autonomy vs. control
The tension. Unconstrained ability to handle novel tasks versus the deterministic enforcement of safety boundaries.
The first and load-bearing force. Greater autonomy increases an agent’s ability to handle novel tasks, recover from unexpected failures, and choose among alternatives without prescriptive instructions. It also increases the variance of outcomes, the cost of testing, and the surface for unsafe action.
Greater control, expressed through explicit constraints, validators, narrowed tool surfaces, and deterministic orchestration, reduces variance and makes the system tractable to test and audit. It also reduces the system’s ability to handle situations the designer did not anticipate.
There is no universal balance. There is, however, a default for production systems: less autonomy than the team is tempted to grant, and bounded enforcement of the rest (Chapter 5). Systems frequently fail because autonomy was granted on the assumption that the model would behave like a careful junior engineer. Models, even strong ones, are better understood as components with unbounded curiosity, no fear of irreversible action, and no internal sense of cost.
The autonomy/control tradeoff is the single force that most strongly determines the shape of an agentic system. It is the reason this book devotes Chapter 5 to bounded autonomy as a discipline rather than a feature. A system designed around the “right” autonomy point with the wrong control mechanism is brittle in production; a system designed with conservative autonomy and strong control mechanisms can be loosened later as the model and operating discipline prove themselves.
A common mistake is to imagine the choice as a single dial running from “fully manual” to “fully autonomous.” It is closer to a set of independent dials, one for each axis of bounded autonomy (Chapter 5). A system can have high autonomy on which sequence of tools to invoke while having low autonomy on which data the tools may touch; high autonomy on iteration count while having low autonomy on cost spend. Treating the dials as independent prevents the failure mode in which all dials are turned up together.
Exploration vs. cost
The tension. Better answers from more exploration versus the tokens, time, and money each additional pass consumes.
The reasoning component can almost always improve its output by exploring more, generating more candidates, reflecting more times, calling more tools, retrieving more context. Each increment of exploration carries an increment of cost in tokens, time, external API spend, and downstream resource use.
The architectural question is not “should we explore” but “what budget governs exploration, and where is the budget enforced?” The budget can be:
-
Implicit in the reasoning loop’s stop conditions (poor; the model decides when it is done).
-
Bound to a single dimension like iteration count (acceptable for some patterns; misses cost spikes from expensive tools).
-
Multi-dimensional and externally enforced — iterations, tokens, wall-clock, spend, retries, with hard limits that abort the loop (the form recommended in Chapter 5).
The tradeoff has a non-obvious shape: the marginal return on exploration is steep and then flat. Two iterations are often dramatically better than one; 10 are rarely much better than eight. Most exploration budgets in production systems are set too high because the architect tested with the model converging in three iterations on easy cases and did not measure the long tail. The percentile distribution of useful exploration matters far more than the average.
Self-consistency, debate, and tree-of-thought patterns (cataloged in Chapter 4) are explicit explorations of this force. Each commits resources to additional reasoning passes in exchange for variance reduction. The architectural question is when the variance reduction justifies the cost. For a $0.05 query asked thousands of times per day, a self-consistency multiplier of five may be unjustifiable; for a once-per-week strategic analysis worth thousands of dollars in human time, a multiplier of ten may be cheap.
The deeper architectural point is that the cost dimension to optimize depends on what the system is for. Interactive systems optimize wall-clock; batch systems optimize spend; high-stakes one-shot systems optimize quality and accept high cost. Naming this explicitly in the design, “this system optimizes X subject to constraints on Y and Z”, prevents the failure where the team applies exploration patterns appropriate for one optimization profile to a system that should have been on another.
Adaptability vs. predictability
The tension. Adapting to novel situations at runtime versus behavior that is predictable, testable, and certifiable across runs.
An agentic system is valuable in part because it can adapt, replan after a failure, switch tools, change strategy. The same adaptability makes the system’s behavior harder to predict from one run to the next, harder to test, and harder to certify against compliance requirements.
The architectural moves that mitigate the tradeoff are:
-
Replay testing (Chapter 12) — record traces and assert on them; if the trace stays within the envelope of an approved replay, behavior is treated as acceptable.
-
Bounded plan revision — allow the agent to replan, but only a bounded number of times and only along approved branches.
-
Deterministic substrate — keep the deterministic infrastructure deterministic; the only variability is in the reasoning component, which is itself bounded.
-
Behavioral envelopes — describe the agent’s space of acceptable behaviors rather than its exact behavior, and assert at deployment that the envelope holds.
The deepest cost of unmanaged adaptability is organizational. Stakeholders cannot certify a system whose behavior they cannot describe. The architecture’s job is not to eliminate adaptability, that would defeat the purpose, but to make the envelope of adaptability describable.
Regulated industries pay this cost most visibly: a financial-services agent that adapts unpredictably cannot be deployed at all in jurisdictions that require behavioral certification. The architectural answer is the envelope: the agent has wide latitude inside a tightly described boundary, and the boundary is what is certified. The certification is then stable across model versions, prompt changes, and small architectural revisions; only changes to the envelope require re-certification.
For non-regulated systems, the same principle applies for different reasons. A customer-facing system whose behavior is unpredictable from week to week loses user trust independently of whether any individual run was correct. Users build mental models of systems and rely on those models. An envelope that holds across runs lets users build a working model; behavior without an envelope prevents them from doing so.
Centralization vs. emergence
The tension. Centralized orchestration that is easy to reason about and govern versus decentralized coordination that scales and partitions responsibility.
Single-agent systems with centralized orchestration are easier to reason about, debug, and govern. Multi-agent systems with decentralized coordination scale better, partition responsibility, and can exhibit useful emergent behavior.
The honest reading of the 2024–2026 literature is that multi-agent is over-prescribed. A surprising fraction of production agentic systems are single agents with tools, plus possibly an orchestrator that delegates to specialized sub-agents under bounded authority. True multi-agent systems with peer-to-peer coordination, voting, debate, or auction-based allocation remain rare in production and disproportionately common in demos.
The pragmatic default: start with a single agent and tools, then add an orchestrator with workers if responsibility decomposes cleanly, then, and only with strong justification, move to peer multi-agent shapes. The CSIRO catalog and Gulli’s chapter on multi-agent collaboration cover the shapes thoroughly, the architectural question is which shape does the problem actually require, not how do we build the most interesting one.
The tradeoff also has an observability cost. Multi-agent traces are harder to read, debug, and replay than single-agent traces. Chapter 12 develops trace discipline; Chapter 9 covers coordination shapes in compressed form. An incident in a single-agent system that an on-call engineer can debug in 20 minutes can take hours in a poorly-instrumented multi-agent system, because the engineer must reason about cross-agent interactions, inter-agent message channels, and timing artifacts.
The architectural commitment in multi-agent systems is to make the coordination layer first-class. Inter-agent channels are governed (Chapter 6); messages are structured; the orchestrator’s view of fleet state is observable; the trace links events across agents through correlation identifiers. Without this, multi-agent systems decay into hard-to-debug systems whose value depends on no two agents disagreeing in a way the architecture cannot resolve.
Single-agent systems are dull architectures. They are also the ones that ship reliably and survive contact with production, which is the test that matters, and the reason to resist a more interesting shape when a simpler one will do.
Deliberation vs. latency
The tension. Deeper reasoning for better answers versus the response-time budget the system must meet.
Deeper reasoning produces better answers and slower responses. For interactive systems (chat, IDE assistants, voice) latency is a hard constraint; for batch systems (research agents, code migrations, document processing) latency is a budget, not a cliff.
The architectural moves here are:
-
Two-tier reasoning — a fast tier handles common cases; a deliberate tier handles flagged ones. Anthropic’s evaluator-optimizer pattern is one shape; Andrew Ng’s reflection pattern is another.
-
Asynchronous deliberation — for batch work, decouple the user interaction from the reasoning, with structured progress reporting (Chapter 18).
-
Streaming — for interactive cases, surface intermediate reasoning to the user while the loop continues. Streaming improves perceived responsiveness without reducing actual computation.
-
Progressive disclosure — load context (and skills, Chapter 10) only when the task warrants the deeper reasoning, and keep the fast tier light.
It helps to separate two latencies. Information latency, time to first token, is what a user feels, and streaming addresses it directly. Action latency, time until the agent actually acts on the environment, such as calling a database or committing a change, is untouched by streaming: if the agent must reason for 15 seconds before invoking the tool, the side effect still lands 15 seconds late. For systems that chat with users, information latency dominates; for systems that act on the world, action latency is the real constraint, and streaming is no remedy for it.
Note that reasoning models in 2026 reduce the visible latency penalty for deeper reasoning, because more of the reasoning is internal to a single model call. This shifts some patterns (reflection, plan-execute) from architectural to model-internal, see Chapter 4. The architectural question, however, persists: the reasoning model’s internal deliberation still consumes time and tokens, and a system that calls the reasoning model on every interaction pays the cost on every interaction.
A neglected aspect of the latency tradeoff is latency variance. Users tolerate consistent latency better than variable latency. A system whose responses come in two seconds 99% of the time and 12 seconds 1% of the time feels less reliable than one whose responses always take five. The architectural commitment is to monitor percentiles, not averages, and to bound the long tail at the time-budget axis (Chapter 5).
Memory depth vs. context constraints
The tension. Richer memory for competence versus the finite, attention-limited context window the model reasons over each turn.
Agents benefit from richer memory: past tasks, learned strategies, domain knowledge, user preferences. Richer memory introduces retrieval complexity, increases the risk of leaking irrelevant context into the loop, and consumes the bounded window available to the reasoning model on every turn.
The force breaks into two sub-tensions:
-
Within-loop memory. The reasoning model has a finite context window (even at the multi-million-token sizes of 2026 models, finite). Filling it with retrieval results, prior turns, and tool documentation reduces the space available for reasoning. The Skills layer (Chapter 10) and Anthropic’s progressive disclosure mechanism explicitly target this: load only what is needed, when it is needed.
-
Cross-loop memory. Episodic and semantic memory (Chapter 7) accumulate across runs. They improve the agent’s competence over time and degrade the predictability of the system, because behavior now depends on history that may not be obvious from the current input.
The pragmatic balance: prefer rich tool-mediated retrieval to large static context. Prefer compact, summarized memory to verbatim memory. Treat memory as a managed resource with policies for what is captured, when it is consulted, and how it is invalidated. Chapter 7 develops this in depth.
The dominant cost of a bloated context is not financial; it is attentional. Models attend non-uniformly across long contexts: relevant content placed late in a long window is processed less effectively than the same content in a short one. A 100-token context with the right content beats a 100,000-token context with the right content buried in the middle. Even a multi-million-token window that is cheap to fill degrades the model’s reasoning about the immediate task when most of it is noise. Memory architecture is, first, a signal-to-noise problem.
The financial argument is weaker than it once was. Provider-side prompt caching, introduced across the major APIs in late 2024, lets a large static prefix, system instructions, tool documentation, stable semantic memory, be resent each turn at a fraction of its first-call cost. Caching does nothing for content that changes every turn, and nothing for the attention problem above, but it removes the simplest reason to keep static context small. Cost discipline now matters most for the dynamic, per-turn portion of the context and at high scale, where the ratio between disciplined and undisciplined memory architectures can still reach an order of magnitude.
Generality vs. governability
The tension. One agent that can do anything versus a narrowly scoped agent that is straightforward to bound and own.
A general agent, one capable of arbitrary tool use, arbitrary tasks, arbitrary domains, is impressive in demos and hard to govern. A narrowly scoped agent, one tool surface, one task class, one domain, is dull to demonstrate and straightforward to govern.
The force shows up as a structural choice in nearly every project:
-
Specialist agents with narrow tool surfaces (one for code, one for SQL, one for incident response). Easier to bound. Often combined under an orchestrator.
-
Generalist agents with broad tool surfaces and a reasoning loop that decides which tool to use. Harder to bound. Failure modes are harder to enumerate.
The book’s recommendation, consistent throughout, is the specialist direction: bounded scope first, generality only when the operating profile justifies it. Most production agentic systems work because they are doing something narrow well, with strong governance, rather than something broad mediocrely with weak governance.
The decision has organizational implications too, and Conway’s Law is the right lens for them: systems come to mirror the communication structures of the organizations that build them. A specialist agent has a clear owner, the team for that domain, so the architecture and the org chart reinforce each other. A generalist agent has unclear ownership; everyone is responsible, which usually means no one is, and the system decays because no single team’s structure maps to it. Specialist orchestrator-worker architectures align with Conway’s Law; monolithic generalist agents work against it, and tend to be worse-maintained over time as a result.
There is a counterargument: an organization may not have the budget for many specialist agents and may prefer one generalist. The architectural answer is to factor the agent’s responsibilities so that the specialist split is internal to the system even if it is not visible to the user. An orchestrator-worker architecture (Chapter 9) presents a single front to the user while internally being a small team of specialists; this preserves the governance benefits of specialization without multiplying user-facing complexity.
How forces compose
In most design decisions, two of the seven forces dominate and the rest follow.
-
Bounding autonomy (Chapter 5) is primarily about autonomy vs. control and exploration vs. cost.
-
Governance architecture (Chapter 6) is primarily about adaptability vs. predictability and generality vs. governability.
-
Memory design (Chapter 7) is primarily about memory depth vs. context constraints and adaptability vs. predictability.
-
Coordination (Chapter 9) is primarily about centralization vs. emergence and deliberation vs. latency.
-
Skills (Chapter 10) is primarily about memory depth vs. context constraints and generality vs. governability.
-
Failure modes (Chapter 11) is the consequence of failing to balance any of these forces.
When reading later chapters, identify the pair of forces dominating the pattern. The pattern’s tradeoffs section is, in essence, a statement of which force was accepted as the cost and which was preserved as the benefit. A reader who can map any pattern in the book (or in the canonical sources) to a force pair has the analytical vocabulary the book is trying to provide; a reader who cannot has either missed something or has identified a pattern that does not deserve its place.
It is also useful to recognize that good patterns are clear about their force balance, while bad patterns are vague. A pattern that claims to optimize everything simultaneously is hiding the tradeoff it makes; the tradeoff is real and will appear in production. A clear statement of which force the pattern privileges is part of what makes a pattern usable.
Forces, patterns, and deliberate choice
Several existing pattern catalogs (Gulli 2025, the Augment Code 2026 guide) list forces alongside patterns. Where they differ, this book follows the principle that forces describe tensions and patterns describe responses to them. A pattern is good when it makes the architect’s choice explicit; it is bad when it disguises the choice as a default. A useful test for any proposed pattern, in this book, a canonical source, or a team’s design document, is to ask which forces it balances and at what cost. If the answer is unclear, the pattern is incompletely specified; if the answer is “all of them,” the pattern is incoherent, because the forces are not simultaneously optimizable.
The forces themselves are stable; the positions architects should take on them shift as the field evolves. In 2024 the right point on the autonomy/control axis was further toward control than many architects realized. In 2026, with stronger reasoning models, the same axis can be carried slightly toward autonomy in domains where the bounding layer is mature. In 2028 it may shift again.
The architectural commitment is invariant across these shifts: whatever the current position, it is a deliberate choice, expressible as a balance of forces, defended by the design’s bounding and governance, and revisable as the field evolves. A team that builds with this discipline can move along the axes as conditions change; a team that does not is locked into the position it implicitly chose at the start. Chapter 4 and onward use the forces named here as the vocabulary for that discussion.
Summary
Seven structural forces, autonomy/control, exploration/cost, adaptability/predictability, centralization/emergence, deliberation/latency, memory depth/context, and generality/governability, drive nearly every architectural decision in agentic systems. The patterns in this book and in canonical sources can be read as deliberate balances among these forces. The next chapter maps the cognitive layer briefly, with attention to which patterns have become model-internal in 2026, from Chapter 5 onward, each architectural chapter is explicitly framed by the forces it balances.