coding agents

Context files are probably hurting your coding agent.

A recent paper from ETH Zurich tested CLAUDE.md and AGENTS.md files across 438 real tasks. The results challenge a lot of the standard advice.

Gloaguen et al., 2026

I've been spending a lot of time with coding agents recently, and one of the things I kept hearing — from Anthropic's docs, from the AGENTS.md project, from various blog posts — is that you should give your agent a detailed context file. Describe your codebase, list your directory structure, explain your conventions. The general advice is: more context is better.

A team at ETH Zurich recently published a paper that actually tests this, and the results are interesting enough that I think they're worth summarising here. They ran Claude Code, Codex, and Qwen Code across 438 tasks from real GitHub repos, in three configurations: no context file, an auto-generated one (using each agent's /init command), and a hand-written one that developers had committed to the repo. The headline numbers:

−3% Fewer tasks solved with
auto-generated files
+22% Higher cost
per task
+4 Extra steps
per task

That's the average across all agents and both benchmarks (SWE-bench Lite and a new one they built called AgentBench, sourced from 12 repos that already had developer-written context files). I think there are a few things worth pulling out of the paper.

What the paper found.

1. Auto-generated context files make things slightly worse. In five of eight test configurations, agents solved fewer tasks when they had an auto-generated context file than when they had nothing at all. The performance drop is small (about 3% on average), but the cost increase is not: 22% more per task. My read is that the auto-generated files are mostly restating what's already in the README and docs. The agent can find those on its own.

2. Codebase overviews don't help agents navigate. This one surprised me. Nearly all generated context files included an overview of the repo structure, and 8 of 12 developer-written files did too. But when the researchers measured how quickly agents found the relevant files, overviews made no measurable difference. In some cases, agents spent extra steps re-reading the context file rather than just exploring the code. (I suspect this is related to how agents tend to treat anything in their context window as important, regardless of whether it's actually useful for the current task.)

3. Agents follow instructions in context files — and that's the problem. When a context file mentioned uv, agent usage of that tool jumped from near zero to 1.6 calls per task. Repo-specific tooling went from 0.05 to 2.5 calls when mentioned. The agents are obedient. They do what you tell them. But each additional instruction consumes reasoning capacity and adds steps, which is why accuracy doesn't improve even though the instructions are being followed. The paper showed a 22% increase in reasoning tokens for GPT-5.2 when context files were present.

4. Developer-written files help a bit, but also cost more. Human-written context files outperformed auto-generated ones across all four agents, and outperformed having no context file for three of the four. The gain was about 4% on average. But they also added steps and cost. The value seems to come specifically from information that isn't documented elsewhere in the repo — the things a new contributor would get wrong on their first day.

There's a nice control experiment in the paper: when they removed all documentation from the repos (READMEs, docs folders, example code), auto-generated context files suddenly started helping, improving performance by 2.7% on average. This confirms the obvious interpretation — the auto-generated files are just worse versions of documentation that already exists. The value of a context file is specifically in what isn't written anywhere else.

What I think this means in practice.

My takeaway is fairly straightforward: if you're using /init to generate your context file and leaving it mostly as-is, you're probably making things a bit worse and definitely making them more expensive. The auto-generated file is restating things the agent already has access to, and the additional instructions are consuming reasoning budget without improving outcomes.

The developer-written files that actually helped were short (about 640 words on average) and focused on things the agent couldn't infer from the codebase. Specific tooling requirements, non-obvious test commands, constraints that aren't enforced by linters. This makes sense to me — it's the same kind of information you'd put in an onboarding doc for a new engineer, not an architecture overview.

Worth including
  • Build and test commands that aren't obvious from config files
  • Which tools to use (and which not to)
  • Rules the agent would break without being told
  • Constraints that aren't enforced by linters or CI
Probably not worth including
  • Codebase overviews and directory descriptions
  • Architecture explanations
  • Style guides (use a linter)
  • Anything already in your README or docs

Here's roughly the template I'd use, based on what the paper suggests works:

CLAUDE.md
# CLAUDE.md

## Build & Test
- Run tests: `pytest tests/ -x --tb=short`
- Single test: `pytest tests/test_foo.py::test_name -x`
- Lint: `ruff check . --fix`
- Type check: `pyright`

## Tooling
- Use `uv` for deps, not pip
- Use `ruff` for formatting, not black

## Rules
# Things you'd get wrong without being told
- All API responses use `ApiResponse` from `src/utils/response.py`
- After schema changes: `alembic revision --autogenerate`
- Env vars load from `.env.local`, not `.env`

## Style
# Only what linters don't catch
- Error strings use the `t()` i18n wrapper
- New endpoints need an entry in `docs/api/`

That's roughly it. No codebase overview, no architecture walkthrough, no description of what each folder does. The agent will figure that out by exploring, and according to this paper, it'll do so faster than if you tried to explain it upfront.

A useful test for each line.

The way I'd think about it: for every line in your context file, ask whether the agent would get this wrong if the line weren't there. If the answer is no — if the agent could discover it from your code, your tests, your README, or your config files — then it's adding cost without adding value. The paper suggests the cost is real (22% more reasoning tokens, 4 extra steps per task on average), and the value of redundant information is zero or slightly negative.

The lines that earn their place are the ones where the answer is yes: the agent would default to pip when you need uv, it would miss the ApiResponse wrapper, it would run the wrong test command. Those are the things worth writing down.

I suspect this advice will change as agents get better at processing long contexts and as the tooling around context files matures. But for now, based on this paper's evidence, less is more. Write the onboarding notes, not the wiki.