This document analyzes how file editing works in three AI coding assistants, highlighting the unique approaches and tricks each uses.
| Feature | Codex | Gemini-CLI | OpenCode |
|---|---|---|---|
| Language | Rust | TypeScript | TypeScript |
| Patch Format | Custom AI-friendly | Literal string replacement | Custom patch + literal replacement |
| Fuzzy Matching | 4-tier (exact → unicode normalized) | 3-tier (exact → flexible → regex) | 9 replacer strategies |
| Self-Healing | No (validation only) | LLM-based correction | No (falls back through strategies) |
| Conflict Detection | Validation before apply | Hash-based change detection | File mtime + lock chains |
Location: codex-rs/apply-patch/
Codex uses a custom declarative patch format designed for AI reliability:
*** Begin Patch
*** Add File: path/to/new.txt
+line1
+line2
*** Update File: path/to/existing.txt
@@ context_marker
context_line
-old_line
+new_line
*** Delete File: path/to/remove.txt
*** End Patch
Why custom? Traditional unified diffs have ambiguous line numbers that LLMs often get wrong. This format is position-independent—it finds context lines wherever they appear.
When locating where to apply a patch, Codex tries increasingly lenient matching:
| Tier | Strategy | Handles |
|---|---|---|
| 1 | Exact byte match | Perfect input |
| 2 | trim_end() |
Trailing whitespace differences |
| 3 | trim() |
Leading/trailing whitespace |
| 4 | Unicode normalization | Smart quotes, em-dashes, non-breaking spaces |
The Unicode trick is clever—it converts fancy punctuation to ASCII:
fn normalise(s: &str) -> String {
s.trim().chars().map(|c| match c {
'\u{2010}'..='\u{2015}' | '\u{2212}' => '-', // dashes
'\u{2018}'..='\u{201F}' => '"', // smart quotes
'\u{00A0}' | '\u{2002}'..='\u{200A}' => ' ', // special spaces
other => other,
}).collect()
}This handles copy-pasted code from rich text editors that silently converts " to ".
When a patch has multiple hunks, Codex applies them in reverse line order:
for (start_idx, old_len, new_segment) in replacements.iter().rev() {
lines.remove(start_idx);
lines.insert(start_idx, new_line);
}Why? If you apply from top-to-bottom, each insertion shifts subsequent line numbers. Reverse order means earlier replacements don't invalidate later positions.
Each hunk can include a change_context line (e.g., def function_name():):
pub struct UpdateFileChunk {
pub change_context: Option<String>, // Narrows search to this location
pub old_lines: Vec<String>,
pub new_lines: Vec<String>,
}Instead of searching the entire file, Codex first finds the context marker, then searches nearby. This prevents matching identical lines in the wrong function.
The parser can extract patches from bash heredocs:
cd /project && apply_patch <<'EOF'
*** Begin Patch
...
*** End Patch
EOFUses Tree-sitter for robust bash parsing—handles GPT's tendency to wrap patches in shell commands.
Strengths:
- Very robust against LLM quirks (unicode, whitespace, shell wrapping)
- Declarative format prevents line-number errors
- Fast Rust implementation
Weaknesses:
- No self-correction—fails if context can't be found
- Non-standard format requires LLM training
Location: packages/core/src/tools/
Gemini uses literal string replacement as the primary mechanism:
interface EditToolParams {
file_path: string;
old_string: string; // Exact text to find
new_string: string; // Replacement text
expected_replacements?: number; // Validate occurrence count
}No patch format—the LLM specifies exactly what to find and replace.
If exact match fails, Gemini tries progressively fuzzier matching:
| Strategy | How It Works |
|---|---|
| Exact | content.indexOf(oldString) |
| Flexible | Line-by-line matching ignoring indent, applies original indentation to replacement |
| Regex | Tokenizes both strings, builds regex with \s* between tokens |
The flexible replacer is particularly clever:
// Original file has: " const x = 1;"
// LLM sends old_string: "const x = 1;" (no indent)
// Flexible matcher finds it, preserves " " prefix in replacementThe killer feature. When edits fail, Gemini asks the LLM to fix them:
// llm-edit-fixer.ts
const prompt = `Your task is to analyze a failed edit attempt and provide
a corrected search string that will match the text in the file precisely.
The correction should be as minimal as possible...`;
const result = await generateJson({
schema: { search: string, replace: string, noChangesRequired: boolean },
prompt: `File content:\n${fileContent}\n\nFailed search: "${oldString}"`
});This handles cases where the LLM hallucinated slightly wrong code—the fixer finds the actual matching text.
Caching: Uses LRU cache (50 entries) to avoid re-calling the LLM for the same failed edit.
JavaScript's string.replace() treats $1, $& etc. as special:
"foo".replace("foo", "$1") // Throws or produces garbageGemini escapes these:
function safeLiteralReplace(str: string, oldString: string, newString: string): string {
if (!newString.includes('$')) {
return str.replaceAll(oldString, newString);
}
const escapedNewString = newString.replaceAll('$', '$$$$');
return str.replaceAll(oldString, escapedNewString);
}Essential for editing JavaScript/TypeScript code with template literals.
Before writing, Gemini checks if the file changed since it was read:
const originalHash = sha256(originalContent);
// ... time passes ...
const currentHash = sha256(fs.readFileSync(filePath));
if (originalHash !== currentHash) {
throw new Error("File modified externally");
}Prevents clobbering changes from other editors or parallel LLM requests.
Strengths:
- Self-healing via LLM correction handles hallucinated code
- Simple find/replace is easy for LLMs to generate
- Good caching reduces redundant LLM calls
Weaknesses:
- No native multi-hunk support (each edit is independent)
- Self-correction adds latency (40s timeout)
- Requires LLM call even for simple whitespace mismatches
Location: packages/opencode/src/tool/ and packages/opencode/src/patch/
OpenCode supports both literal replacement AND a custom patch format similar to Codex.
The most comprehensive fuzzy matching of the three:
const replacers = [
SimpleReplacer, // Exact match
LineTrimmedReplacer, // Ignore per-line leading whitespace
BlockAnchorReplacer, // First/last line anchor + Levenshtein on middle
WhitespaceNormalizedReplacer, // All whitespace → single space
IndentationFlexibleReplacer, // Remove all common indent
EscapeNormalizedReplacer, // Handle \\n, \\t escape sequences
TrimmedBoundaryReplacer, // Trim outer boundaries
ContextAwareReplacer, // 50% similarity threshold on context
MultiOccurrenceReplacer, // Replace ALL matches when requested
];Each yields candidates; first match wins. This covers edge cases the other engines miss.
The BlockAnchorReplacer uses edit distance:
function levenshtein(a: string, b: string): number { /* standard impl */ }
const similarity = 1 - (levenshtein(searchMiddle, candidateMiddle) /
Math.max(searchMiddle.length, candidateMiddle.length));
if (candidates.length === 1) {
threshold = 0.0; // Accept any anchor match
} else {
threshold = 0.3; // Require 30% similarity for disambiguation
}Clever: For single candidates, it's very permissive (trust the anchors). For multiple candidates, it requires similarity to disambiguate.
Serializes writes to the same file:
const locks = new Map<string, Promise<void>>();
async function withLock<T>(filepath: string, fn: () => Promise<T>): Promise<T> {
const currentLock = locks.get(filepath) ?? Promise.resolve();
const { promise, resolve } = Promise.withResolvers<void>();
locks.set(filepath, currentLock.then(() => promise));
await currentLock; // Wait for previous write
try {
return await fn();
} finally {
resolve(); // Allow next writer
}
}No external lock files—uses JavaScript's event loop to serialize. Elegant for a single-process tool.
Combines with locking for full protection:
const lastReadTime = fileTimeMap.get(filepath);
const currentMtime = (await fs.stat(filepath)).mtime;
if (currentMtime > lastReadTime) {
throw new Error("File modified since last read. Please re-read file.");
}Forces the LLM to re-read after external changes, ensuring it works with current content.
After every edit, OpenCode queries the Language Server:
const diagnostics = await lsp.getDiagnostics(filePath);
return {
...result,
errors: diagnostics.slice(0, 20) // Return up to 20 errors
};Immediate syntax/type error feedback helps the LLM self-correct in the next turn.
Strengths:
- Most comprehensive fuzzy matching (9 strategies)
- Proper file locking for concurrent safety
- LSP integration catches errors immediately
Weaknesses:
- No LLM self-correction (relies on strategy fallback)
- Complex codebase with two parallel edit systems
- Levenshtein on every potential match could be slow on large files
| Engine | Approach | Trade-off |
|---|---|---|
| Codex | Custom declarative format | Robust but requires format training |
| Gemini | Pure find/replace | Simple but no multi-hunk atomicity |
| OpenCode | Both supported | Flexible but complex |
| Engine | Strategy |
|---|---|
| Codex | Fail with clear error message |
| Gemini | LLM self-correction (expensive but works) |
| OpenCode | Fall through 9 strategies (fast but limited) |
| Engine | Mechanism |
|---|---|
| Codex | None (single-threaded assumption) |
| Gemini | SHA256 content hashing |
| OpenCode | Promise chains + mtime checks |
Based on this audit:
- Adopt Gemini's safe dollar-sign replacement—essential for JS/TS editing
- Consider OpenCode's mtime checking—prevents overwriting external changes
- Codex's unicode normalization is valuable—handles smart quotes from copy-paste
- Skip LLM self-correction initially—adds complexity and latency; try fuzzy matching first
- Use reverse-order application for any multi-hunk patch support
The ideal hybrid would use:
- Literal find/replace as primary (like Gemini)
- 4-5 fuzzy matching tiers (subset of OpenCode's 9)
- Codex-style unicode normalization as a fallback tier
- File mtime checking before writes