Context Engineering: What the Model Is Allowed to See

The model only knows what you put in front of it.

That sentence is obvious and routinely ignored. When an AI workflow misbehaves, the instinct is to rewrite the prompt: add an instruction, sharpen the wording, try a different phrasing. Often the prompt was never the problem. The model was working from the wrong information, too much of it, or the right information buried where it could not find it. The discipline of getting that part right now has a name, context engineering, and through 2026 it has quietly become the single highest-leverage skill for anyone building real work on top of AI.

It is worth understanding what it is, why it matters more than prompt wording, and what it changes for anyone using AI in a regulated Australian setting, where what the model sees is not just a performance question but a privacy and evidence one.

What context engineering is

The clearest articulation comes from Anthropic's engineering team, in a piece published in September 2025 that has aged into a reference point. Their framing: "At Anthropic, we view context engineering as the natural progression of prompt engineering." Where prompt engineering is about writing good instructions, context engineering is broader. They define it as "the set of strategies for curating and maintaining the optimal set of tokens (information) during LLM inference."

The context window is everything the model can see at the moment it generates a response: your instructions, the system prompt, the definitions of any tools it can call, the documents you have retrieved, the prior turns of the conversation, and the outputs of any actions it has taken. Prompt engineering is one slice of that, the instruction slice. Context engineering is the whole window, and the recognition that what fills it determines the quality of the answer at least as much as how the instruction is phrased.

The goal is not to cram in everything that might be relevant. It is the opposite. Anthropic put it as "finding the smallest possible set of high-signal tokens that maximize the likelihood of some desired outcome." More is not better. The skill is curation.

A single atmospheric scene of a limited pool of warm light illuminating a small set of clear documents at the centre while a vast flood of dim material sits unread in the surrounding shadow — The model spends a limited attention budget. Fill it with signal, not volume.

Why curation beats volume

The reason more context hurts is not a quirk of one model. It is structural. Anthropic describe it through a useful analogy: "Like humans, who have limited working memory capacity, LLMs have an 'attention budget' that they draw on when parsing large volumes of context." Spend that budget on noise and there is less of it for the thing that matters.

This is why the headline numbers about ever-larger context windows mislead. A model that can technically accept a million tokens is not a model that uses a million tokens well. As Anthropic state plainly, "as the number of tokens in the context window increases, the model's ability to accurately recall information from that context decreases." The phenomenon has a nickname, context rot: stuff the window full and the model's grip on any single fact inside it loosens. The practical consequence is that dumping an entire knowledge base, a whole case file, or a fifty-message conversation into the prompt and hoping the model finds the relevant part is a worse strategy than giving it the three paragraphs that actually bear on the question.

That reframes a lot of "the AI got it wrong" moments. A model that confidently used the wrong figure may have had the right figure available, drowned among twenty other numbers it could not distinguish. A model that ignored your instruction may have had it pushed out of effective attention by ten thousand tokens of pasted background. These read as model failures and are often context failures, and the fix is curation, not a better-worded plea.

A simple test makes the point. Ask a model a question with the single relevant paragraph in front of it, and it usually answers well. Bury that same paragraph in the middle of a forty-page document and ask again, and the answer often degrades, not because the model got worse but because the signal got harder to find. Nothing about the prompt changed. Only the context did.

The moves that make up the discipline

Context engineering is not abstract. Anthropic's guidance names concrete strategies, and they generalise well beyond any one vendor.

The first is writing system instructions at the right altitude: specific enough to guide behaviour, general enough not to be brittle, and free of the contradictory or redundant instruction that wastes attention. The second is keeping tools lean. When you give a model many tools with overlapping purposes, the definitions themselves consume context and the model wastes effort deciding between near-duplicates. A small set of clear, non-overlapping tools beats a sprawling toolbox.

The third, and the one that changes how people build, is just-in-time retrieval. Rather than loading everything the model might need up front, the system pulls information on demand, when the task actually calls for it. The fourth is compaction: when a conversation or an agent run grows long, summarise what matters and reinitialise, so the working context stays small and sharp instead of accreting every turn. The fifth is structured note-taking, giving the model a place to write down and retrieve what it learns, so knowledge persists without sitting in the live window the whole time.

That last one now has first-class tooling. Anthropic's memory tool, for example, "enables Claude to store and retrieve information across conversations through a memory file directory ... allowing it to build knowledge over time without keeping everything in the context window." The point is the same: keep the live context small, and reach for what you need when you need it.

There is a sixth move worth naming for anyone running more complex work: sub-agents. Instead of one model trying to hold an entire sprawling task in a single context, you split the work, let separate agents each tackle a piece in their own clean window, and have them return a distilled result rather than their whole working history. Each sub-agent spends its full attention budget on a narrow job, and the main agent never has to carry the noise of how each piece was done. It is the same principle as compaction, applied across a team of models rather than within one conversation: keep every context window focused, and pass forward conclusions, not transcripts.

A left-to-right path of four soft rounded pill nodes connected by a flowing line: curate the instructions, retrieve just in time, compact the long context, write to memory — The loop. Curate, retrieve on demand, compact, persist to memory, then keep the window lean.

What it means for regulated work

For an Australian audience working under privacy law, the SRC Act, or APRA's standards, context engineering is not only a performance discipline. It is a governance one, because what enters the context window is a decision about what data the model handles.

Every document, claim detail, customer record or control description you place in the context is information the model is now processing, possibly through a third-party service, and possibly into logs or future training depending on the tool and its settings. The curation question, what is the smallest high-signal set the model needs, is the same question privacy practice has always asked: collect and use only what is necessary for the purpose. Deciding what the model sees is therefore an opportunity to apply de-identification, minimisation and access discipline at exactly the right point, before the data is in front of the model rather than after.

There is a security dimension too. Anthropic's memory documentation is candid that persistent context introduces risk: "Malicious path inputs could attempt to access files outside the /memories directory. Your implementation MUST validate all paths." And it notes that while the model will usually refuse to write down sensitive information, builders should implement stricter validation themselves. The lesson generalises: the more context a system carries across sessions, the more deliberately you have to govern what goes into it and who can reach it.

Make it concrete. A case manager using AI to help draft a determination should not paste an entire unredacted file into the context and ask for a summary. The context-engineering move is also the compliance move: de-identify first, then give the model only the specific evidence that bears on the question at hand. A compliance team using AI to check a control against a standard should load that control and that clause, not the whole framework library. In both cases the curated, minimal context produces a better answer and a defensible data-handling position at the same time. The discipline that makes the model reliable is the same discipline that keeps the work safe.

This lines up with where Australian policy is heading. The Digital Transformation Agency's updated policy for the responsible use of AI in government took effect in December 2025, with its first mandatory requirement landing in June 2026 and the rest to follow, and the through-line is human oversight and assurance over how AI systems are used. Curating and recording what a model is given to work with is precisely the kind of disciplined, auditable practice that posture expects. What the model saw is, increasingly, part of the evidence trail.

The hype check

Two cautions keep this honest. First, context engineering is not a new product you buy. It is a practice, and the temptation will be to treat "bigger context window" or "longer memory" as the feature that solves it. They do not. A larger window without curation just gives context rot more room. The discipline is restraint, and restraint does not come in a release note.

Second, this does not make prompt wording irrelevant. Clear instructions still matter; they are simply one part of a larger picture. The shift is one of emphasis. If your AI workflow is unreliable and your first move is always to reword the prompt, you are working on the wrong layer most of the time. The higher-leverage move is to look at what the model is actually being given to work with.

A frame split into two contrasting halves divided by one thin gold line, the left half a chaotic flood of material overwhelming a dim centre, the right half a small ordered set of clear elements with a bright focused centre, labelled dump and curate — Same model, two strategies. Dump everything and it loses the thread. Curate the signal and it holds.

What to do this week

You do not need to be an engineer to apply this. Three moves work for anyone using AI seriously.

Audit what you are loading. The next time you build a custom project, a workspace or an agent, look at everything you have put in front of the model and ask what is genuinely high signal. Cut the rest. A focused project with the three reference documents that matter outperforms one stuffed with twenty.

Retrieve, do not dump. When you have a large body of material, give the model a way to pull the relevant part on demand, through a project, a connected source, or simply by pasting the specific section, rather than pasting the whole thing every time and hoping.

Treat a misbehaving workflow as a context problem first. Before you rewrite the prompt for the fifth time, ask whether the model has too much, too little, or the wrong information in front of it. More often than not, that is where the fix is. In 2026, the most useful AI skill is not writing the perfect prompt. It is deciding, carefully, what the model is allowed to see.

TheAICommand. Intelligence, At Your Command.

What context engineering is

Why curation beats volume

The moves that make up the discipline

What it means for regulated work

The hype check

What to do this week

Read next

OpenAI Built Its Own Chip. The Real Story Is the Cost of Intelligence.

Your AI Assistant Just Became a Shared Teammate. Govern the Channel.

Model Context Protocol: The Standard Wiring AI Into Your Tools