All posts
2026-04-07 6 min readFrancis Watson

Why diff-only review isn't enough: the bugs that live between files

Most AI code reviewers see the diff and guess. We show what they miss when they can't see the import target, the caller, or the config.

architecturecross-filesecurity

A developer opens a pull request. They've added a new import, called a function from another module, and wired it into the existing flow. The diff looks clean. The tests pass. Every AI code reviewer on the market reads the patch and says "looks good."

Except the function they imported doesn't exist. Or it recurses into itself instead of delegating. Or the class they extended changed its constructor signature last week. The bug isn't in the diff. It's in the space between files.

The diff is a lie of omission

Here's what a typical AI code reviewer sees:

+ import { OptimizedCursorPaginator } from "./paginator"
+
+ class AuditLogView:
+     paginator_class = OptimizedCursorPaginator

Looks reasonable. New class, imported from a nearby module. But if the reviewer could see paginator.py, they'd know: OptimizedCursorPaginator doesn't exist. The PR will crash at runtime. This was one of the 50 bugs in the Greptile benchmark — a High severity bug that only diff-aware tools catch.

Three patterns that break diff-only review

1. The phantom import

The diff adds import { Foo } from "./module" but Foo was never exported from that module. Or it was renamed. Or the module was deleted last sprint. A diff-only reviewer sees a valid import statement and moves on.

To catch this, you need to resolve the import to its target file and check that the symbol exists. Grapple PR does this through cross-file context injection — when the diff imports something, we look it up in the code graph and include the target's actual exports in the agent's prompt.

2. The recursive delegation bug

A caching layer is supposed to call delegate.getForLogin() to fetch data, then cache the result. But the developer wrote this.getForLogin() instead — calling itself. Infinite recursion. Stack overflow in production.

The diff shows this.getForLogin() which looks like a perfectly normal method call. You'd need to see the class hierarchy — that this is a caching proxy and delegate is the real implementation — to spot the bug. This was a Critical bug from the Keycloak benchmark suite.

3. The stale reference

A function modifies a local copy of a config object, transforms some values, then returns... the original. Not the copy. The diff shows the transformation logic, which looks correct. The bug is that the return statement references the wrong variable — and you can only see this by understanding the data flow from the function's entry to its exit.

Our Logic Agent caught this in the Sentry benchmark (PR #3) because it traces variable assignments through the function body, not just the changed lines.

What "full context" actually means

When we say Grapple PR reads the full picture, we mean it literally. Before any agent runs, we assemble a ReviewContext that includes:

*The diff patches (what changed)
*Import targets (what the changes reference)
*Code graph nodes (functions, classes, signatures)
*Dependency edges (who calls what)
*Blast radius (what downstream code is affected)
*Linked issues (what problem this is solving)
*Commit history (90-day churn for changed files)
*CI status (are tests passing?)
*Human reviews (what other reviewers already said)
*Team patterns (what this team accepts vs dismisses)

This context is what separates "the code looks fine" from "this import references a function that doesn't exist in the target module." The diff is necessary but not sufficient.

The benchmark proved it

Of the 22 bugs we missed in the Greptile benchmark, 5 were cross-file bugs — the kind that are invisible if you only see the diff. After we built cross-file context injection (resolving imports from 16 language syntaxes against the code graph), these bugs became catchable.

We're not claiming we catch everything. Our benchmark score is 56%, and we published it. But the bugs we do catch include the ones that no diff-only tool can see — because we read the files around the diff, not just the diff itself.

Read the full benchmark results: We ran the Greptile 50-PR benchmark. Here's what happened.

Try Grapple PR on your next pull request

Free during beta. One-click GitHub App install. No credit card.

Blog — Grapple PR