Skip to content

Instantly share code, notes, and snippets.

@bgauryy
Created January 1, 2026 20:41
Show Gist options
  • Select an option

  • Save bgauryy/e525dfaa37270f3fad8e57a9b24d64b0 to your computer and use it in GitHub Desktop.

Select an option

Save bgauryy/e525dfaa37270f3fad8e57a9b24d64b0 to your computer and use it in GitHub Desktop.
Comprehensive test framework (380+ test cases) comparing Octocode MCP local tools against Claude Code and Cursor built-in tools across 9 test suites. Covers four tool categories: code search, directory listing, file reading, and file finding. Tests span 10 dimensions: context quality, efficiency, output quality, token safety, security, error han…

Local Tools Test Plan: AI Coding Assistants vs Octocode MCP

Comprehensive test plan comparing AI coding assistant internal tools with Octocode MCP local tools

Objective: Validate that Octocode local tools provide superior context, efficiency, output quality, token safety, and security compared to built-in tools in Claude Code and Cursor.


Tool Mapping Overview

# Claude Code Tool Cursor Tool Octocode MCP Tool Primary Use Case
1 Grep grep localSearchCode Pattern search in code files
2 Bash(ls) / Glob list_dir localViewStructure Directory listing and exploration
3 Read read_file localGetFileContent Reading file contents
4 Glob glob_file_search localFindFiles Finding files by name/pattern/metadata

Test Dimensions

Dimension Description Weight
Context Quality Does the tool provide actionable, structured context for AI agents? Critical
Efficiency Speed, bulk operations, resource usage High
Output Quality Structured responses, metadata richness, usability High
Token Safety Output size control, pagination, LLM budget awareness Critical
Security Path validation, secret detection, access control Critical
Error Handling Graceful failures, helpful hints, recovery guidance Medium
Research Context Goal tracking, reasoning preservation, workflow continuity Medium
Large File Handling Character/line pagination, context-aware extraction, memory efficiency Critical
Large Repository Scale Search speed at scale, quality results in monorepos, resource management Critical
Monorepo Awareness Package detection, scoped operations, cross-package search High

Test Suite A: Search Tools

Claude Code Tool: Grep

Cursor Tool: grep

Octocode Tool: localSearchCode


A1. Context Quality Tests

Test ID Test Name Motivation Claude Code Grep Behavior Cursor grep Behavior Octocode Expected Behavior Success Criteria
A1.1 Basic pattern search Verify structured results vs raw output Returns file paths or content with line numbers Returns file:line:content plain text Returns structured JSON with file path, line number, column, byte offset, match content Octocode provides richer metadata
A1.2 Context lines display Verify surrounding context quality -A/-B/-C flags show context lines -C N flag shows raw lines, no grouping contextLines param provides smart grouping with omission markers Context is organized and scannable
A1.3 Multi-file search results Verify results organization across files List of matches per file Flat list of matches, no grouping Grouped by file with match counts, file statistics Results are navigable
A1.4 Empty results handling Verify guidance on no matches Empty results returned Exit code 1, empty output, no guidance status: empty with semantic hints for alternatives Agent receives actionable guidance
A1.5 Match location precision Verify byte-level accuracy Line number with -n flag Line number only Line, column, byte offset, char offset Enables precise navigation
A1.6 Regex pattern support Verify complex pattern handling Full regex via ripgrep Basic regex support Full PCRE/Perl regex with multiline support Complex patterns work
A1.7 Case sensitivity control Verify case handling options -i flag for insensitive -i flag for insensitive caseSensitive, caseInsensitive, smartCase options Flexible case handling
A1.8 File type filtering Verify extension-based filtering glob and type params --include flag type, include, exclude params Easy file type targeting
A1.9 Research context tracking Verify goal/reasoning preservation No concept of research context No concept of research context mainResearchGoal, researchGoal, reasoning in output Research continuity maintained
A1.10 Match statistics Verify count and distribution info output_mode: count for counts Count requires separate -c flag totalMatches, distribution by file included Statistics built-in

A2. Efficiency Tests

Test ID Test Name Motivation Test Scenario Success Criteria
A2.1 Single pattern performance Baseline search speed Search pattern in 10,000 files Response time < 500ms
A2.2 Bulk query efficiency Validate parallel execution 5 different patterns vs 5 sequential calls Bulk >= 3x faster than sequential
A2.3 Large directory handling Memory efficiency under load Search in 100MB directory Peak memory < 50MB
A2.4 Ripgrep to grep fallback Graceful degradation Force ripgrep unavailability Falls back to grep without crash
A2.5 Incremental results Early termination capability Stop after N matches maxMatchesPerFile, maxFiles respected
A2.6 Pattern complexity scaling Performance with complex regex Simple vs complex regex patterns Linear degradation, no timeout
A2.7 Concurrent bulk queries Parallelization efficiency 5 queries executing simultaneously CPU utilization balanced
A2.8 Cold vs warm cache Subsequent query speed Same query twice Second query >= 2x faster

A3. Output Quality Tests

Test ID Test Name Motivation Claude Code Grep Output Cursor grep Output Octocode Expected Output Success Criteria
A3.1 Response structure Verify consistent format Text output with file paths/content Plain text lines Structured YAML/JSON with fields Parseable, consistent schema
A3.2 Metadata richness Verify useful metadata Basic file/line info None File stats, match metadata, hints Rich context provided
A3.3 Hint generation Verify agent guidance No hints No hints Dynamic hints based on results Actionable next steps
A3.4 Status indication Verify clear status Tool success/failure Exit code only status: hasResults|empty|error Clear success/failure
A3.5 Pagination info Verify navigation data head_limit/offset params None pagination object with page/total/hasMore Enables continuation
A3.6 Warning messages Verify edge case alerts None None warnings array for truncation, fallback Agent aware of limitations
A3.7 Error detail quality Verify error helpfulness Generic error messages Generic error messages Specific error with errorCode and hints Debuggable errors
A3.8 Match highlighting Verify match visibility No highlighting No highlighting Match boundaries indicated Easy to locate matches

A4. Token Safety Tests

Test ID Test Name Motivation Risk Without Control Octocode Mitigation Success Criteria
A4.1 Large result set Prevent token overflow 10K matches returned matchesPerPage pagination Output bounded
A4.2 Long line handling Prevent single-line overflow 10KB line returned fully matchContentLength truncation Lines truncated
A4.3 Binary file exclusion Prevent garbage output Binary content included binaryFiles: without-match default Clean text only
A4.4 Many files matched Prevent file count overflow 1000 files in response filesPerPage pagination Files paginated
A4.5 Deep context expansion Prevent context bloat Unlimited context lines contextLines max limit Context bounded
A4.6 Output size estimation Proactive limit warning No warning before overflow Size estimation + hints before large output Early warning
A4.7 Minified file handling Prevent single-line megafiles Huge minified JS searched Detect and warn about minified content Appropriate handling
A4.8 Total response size Global output limit Unbounded response Response size cap with continuation Response bounded

A5. Security Tests

Test ID Test Name Motivation Attack Vector Expected Behavior Success Criteria
A5.1 Path traversal - basic Prevent escape to parent path: "../../etc/passwd" Rejected with error Path blocked
A5.2 Path traversal - encoded Prevent encoded escape path: "..%2F..%2Fetc" Rejected with error Encoded path blocked
A5.3 Path traversal - absolute Prevent absolute escape path: "/etc/passwd" Rejected with error Absolute outside workspace blocked
A5.4 Symlink resolution Prevent symlink escape Symlink pointing to /etc Resolved and blocked Symlink target validated
A5.5 Command injection - pattern Prevent shell injection pattern: "; rm -rf /" Pattern escaped safely No command execution
A5.6 Command injection - path Prevent path injection path: "file; cat /etc/passwd" Path sanitized No command execution
A5.7 Null byte injection Prevent null truncation path: "file\x00/etc/passwd" Rejected Null byte blocked
A5.8 Ignored path access Prevent node_modules access path: "node_modules" Blocked by default Ignored paths respected
A5.9 .git directory access Prevent git data leak path: ".git/config" Blocked Sensitive directories blocked
A5.10 Secret in pattern Prevent secret exposure Search result contains AWS key Secret redacted in output Secrets masked
A5.11 Unicode path manipulation Prevent unicode tricks Unicode lookalike characters Normalized and validated Unicode handled safely
A5.12 Very long path Prevent buffer overflow 10KB path string Rejected with limit error Path length limited

A6. Error Handling Tests

Test ID Test Name Motivation Error Scenario Expected Behavior Success Criteria
A6.1 Non-existent path Graceful missing path Path does not exist Clear error message + suggestions Helpful error
A6.2 Permission denied Handle access errors Read-protected file Skip with warning, continue others Graceful skip
A6.3 Invalid regex Handle bad patterns Malformed regex pattern Parse error with position indicated Debuggable error
A6.4 Timeout handling Prevent hung queries Search takes > 30s Timeout with partial results Graceful timeout
A6.5 Bulk partial failure Isolate query failures 3/5 queries succeed Successful queries return, failures isolated Partial success
A6.6 Empty workspace Handle empty directory No files in path Empty result with hint Clear empty state
A6.7 Circular symlinks Handle symlink loops Symlink loop detected Warning and skip No infinite loop
A6.8 Encoding issues Handle non-UTF8 Binary/unknown encoding Skip or warn Clean handling

Test Suite B: Directory Listing Tools

Claude Code Tool: Bash(ls) / Glob

Cursor Tool: list_dir

Octocode Tool: localViewStructure


B1. Context Quality Tests

Test ID Test Name Motivation Claude Code Bash(ls)/Glob Behavior Cursor list_dir Behavior Octocode Expected Behavior Success Criteria
B1.1 Basic listing Verify output richness ls output or glob patterns Array of filenames only Entries with type, size, extension, permissions Rich metadata
B1.2 Recursive listing Verify depth support ls -R or Glob with ** Requires multiple calls depth parameter for tree view Single call for tree
B1.3 Type filtering Verify filter capability Manual filtering No filtering filesOnly, directoriesOnly params Easy type filtering
B1.4 Extension filtering Verify extension filter Glob patterns (e.g., *.ts) Manual post-filtering extension, extensions params Built-in extension filter
B1.5 Sorting options Verify sort capability ls flags (-t, -S) Alphabetical only sortBy: name, size, time, extension Flexible sorting
B1.6 Size display Verify human-readable sizes ls -lh for sizes No size info humanReadable size formatting 4.2KB instead of 4301
B1.7 Modified time display Verify timestamp access ls -l shows timestamps No timestamp showFileLastModified option Timestamps available
B1.8 Summary statistics Verify aggregate info Requires wc or counting None totalFiles, totalDirectories, summary Quick overview
B1.9 Hidden file handling Verify dotfile access ls -a for dotfiles May hide dotfiles hidden flag to include/exclude Controllable
B1.10 Pattern filtering Verify glob support Glob tool supports patterns No pattern matching pattern param for glob filter Built-in glob

B2. Efficiency Tests

Test ID Test Name Motivation Test Scenario Success Criteria
B2.1 Large directory Performance under scale 1000 files in directory Response < 1s
B2.2 Deep recursion Recursive performance depth=3 on large tree Response < 5s
B2.3 Bulk listing Multiple directories at once 5 directories vs sequential Bulk >= 2x faster
B2.4 Stats-only mode Lightweight overview Summary without full listing Response < 100ms
B2.5 Filtered vs unfiltered Filter performance With vs without extension filter Filtered same or faster
B2.6 Sort overhead Sorting cost Different sort options Sorting < 10% overhead

B3. Output Quality Tests

Test ID Test Name Motivation Claude Code Bash(ls)/Glob Output Cursor list_dir Output Octocode Expected Output Success Criteria
B3.1 Tree visualization Verify readable structure Flat list from ls/glob Flat list Tree-like indented output Visual hierarchy
B3.2 Entry annotations Verify type markers ls -F adds markers None [FILE], [DIR], [LINK] markers Clear type indication
B3.3 Pagination info Verify navigation None None Page number, total pages, hasMore Continuation enabled
B3.4 Hints generation Verify next steps None None Hints for deeper exploration Agent guidance
B3.5 Empty directory Verify empty handling Empty output Empty array status: empty + hints Clear empty state
B3.6 Truncation warning Verify limit indication No indication No indication Warning when truncated Awareness of limits

B4. Token Safety Tests

Test ID Test Name Motivation Risk Without Control Octocode Mitigation Success Criteria
B4.1 Directory with 10K files Prevent massive output All 10K entries returned entriesPerPage pagination Output bounded
B4.2 Very long filenames Handle edge case Long names bloat output Filename truncation with ellipsis Names bounded
B4.3 Deep recursion output Prevent tree explosion Full tree dumped Depth limits + pagination Tree bounded
B4.4 Detailed mode scaling Control metadata size All metadata included details flag toggles verbosity Controllable detail
B4.5 Pre-generation check Proactive limit Large output generated then rejected Estimate size, error before generating Early rejection
B4.6 Summary-only mode Minimal token option Full listing required summary: true without entries Ultra-light option

B5. Security Tests

Test ID Test Name Motivation Attack Vector Expected Behavior Success Criteria
B5.1 Path traversal listing Prevent escape path: "../../../" Rejected Traversal blocked
B5.2 Symlink directory Prevent link escape Symlink to /etc Resolved and blocked Symlink validated
B5.3 Hidden sensitive dirs Prevent leaks List .aws, .ssh directories Blocked by default Sensitive dirs hidden
B5.4 Absolute path outside Prevent arbitrary access /etc or /root Rejected Outside workspace blocked
B5.5 Filename with secrets Prevent credential leaks File named password.txt content Only name shown, no content No content exposed
B5.6 Large depth DoS Prevent resource exhaustion depth: 100 Max depth enforced Depth limited

B6. Error Handling Tests

Test ID Test Name Motivation Error Scenario Expected Behavior Success Criteria
B6.1 Non-existent directory Handle missing path Directory doesn't exist Clear error + suggestion Helpful error
B6.2 Permission denied Handle access errors Read-protected directory Error with explanation Clear access error
B6.3 File path given Handle wrong type File path instead of directory Error suggesting file read tool Appropriate guidance
B6.4 Empty pattern match Handle no glob matches Pattern matches nothing Empty result + hint Clear empty state
B6.5 Bulk partial failure Isolate failures Some directories inaccessible Successful ones return Partial success

Test Suite C: File Content Tools

Claude Code Tool: Read

Cursor Tool: read_file

Octocode Tool: localGetFileContent


C1. Context Quality Tests

Test ID Test Name Motivation Claude Code Read Behavior Cursor read_file Behavior Octocode Expected Behavior Success Criteria
C1.1 Full file read Verify metadata inclusion Content with line numbers (cat -n format) Raw content only Content + totalLines + fileSize + pagination Rich context
C1.2 Pattern extraction Verify targeted reading Manual line calculation Manual line calculation matchString with context lines Easy targeting
C1.3 Line range reading Verify range support offset/limit params offset/limit params startLine/endLine with bounds checking Clear line ranges
C1.4 Multiple patterns Verify multi-match Multiple read calls Multiple read calls Single call with multiple matchStrings Efficient multi-match
C1.5 File metadata Verify file info None (content only) None path, contentLength, encoding, mimeType File info included
C1.6 Partial read indication Verify completeness status No indication No indication isPartial flag with bounds Clear partial status
C1.7 Line number preservation Verify line reference Line numbers included (cat -n) No line numbers Content with line numbers annotated Lines referenceable
C1.8 Match highlighting Verify match visibility No highlighting No highlighting Match positions indicated Matches locatable
C1.9 Context line control Verify context flexibility Fixed via offset/limit Fixed context matchStringContextLines adjustable Flexible context
C1.10 Regex pattern support Verify regex matching Not supported Not supported matchStringIsRegex option Regex enabled

C2. Efficiency Tests

Test ID Test Name Motivation Test Scenario Success Criteria
C2.1 Large file read Single file performance 1MB file Response < 200ms
C2.2 Minification savings Token efficiency JSON/YAML minification 20-50% size reduction
C2.3 Bulk file reads Multiple files at once 5 files vs sequential Bulk >= 3x faster
C2.4 Partial read efficiency Range read speed Read 100 lines from 10K file Response < 50ms
C2.5 Pattern search speed matchString performance Find pattern in 1MB file Response < 300ms
C2.6 Cache effectiveness Repeated read speed Same file twice Second read >= 2x faster

C3. Output Quality Tests

Test ID Test Name Motivation Claude Code Read Output Cursor read_file Output Octocode Expected Output Success Criteria
C3.1 Structured response Verify consistent format Content with line numbers Raw content string Structured object with fields Parseable format
C3.2 Content minification Verify token efficiency Full formatting preserved Full formatting preserved Optional minification for JSON/YAML Smaller output
C3.3 Pagination info Verify continuation Via offset/limit params None charOffset, charLength, hasMore Continuable
C3.4 Line bounds Verify range reporting Based on offset/limit None actualStartLine, actualEndLine Range clarity
C3.5 Encoding info Verify charset Assumed UTF-8 Assumed UTF-8 Detected encoding reported Encoding known
C3.6 Warning messages Verify edge alerts Truncation at 2000 lines None Warnings for truncation, bounds adjustment Issues flagged
C3.7 Hints for next steps Verify guidance None None Hints for deeper reading, search Agent guidance

C4. Token Safety Tests

Test ID Test Name Motivation Risk Without Control Octocode Mitigation Success Criteria
C4.1 Large file full read Prevent massive dump 10MB file returned Size limit + error with guidance Output bounded
C4.2 Character pagination Incremental reading Full content required charLength/charOffset pagination Paginated reading
C4.3 Many pattern matches Prevent match explosion Pattern matches 1000 times Max matches + truncation Matches bounded
C4.4 Long line handling Single line overflow 100KB single line Line truncation with indicator Lines bounded
C4.5 Binary file content Prevent garbage Binary file requested Detection + warning or rejection Clean handling
C4.6 Context expansion limit Prevent context bloat Large context requested Max context lines enforced Context bounded
C4.7 Pre-read size check Early rejection Check size before reading File size check + appropriate error Early detection
C4.8 Minified file detection Handle special files 1MB single-line JS Minified detection + appropriate handling Minified handled

C5. Security Tests

Test ID Test Name Motivation Attack Vector Expected Behavior Success Criteria
C5.1 Path traversal Prevent escape ../../../etc/shadow Rejected Traversal blocked
C5.2 Absolute path outside Prevent arbitrary read /etc/passwd Rejected Outside blocked
C5.3 Symlink to sensitive Prevent link escape Symlink to /etc/passwd Resolved and blocked Symlink validated
C5.4 .env file access Prevent credential leak .env or .env.local Blocked or heavily redacted Env files protected
C5.5 AWS credentials Redact cloud creds File with AWS keys AKIAXXXXXXXX redacted AWS keys masked
C5.6 API keys Redact API credentials File with API keys Keys redacted API keys masked
C5.7 Passwords in code Redact hardcoded creds password = "secret" Value redacted Passwords masked
C5.8 JWT tokens Redact tokens Bearer token in file Token redacted JWTs masked
C5.9 Private keys Redact crypto keys RSA/EC private key content Key content redacted Private keys masked
C5.10 GitHub tokens Redact VCS tokens ghp_xxxx or gho_xxxx Token redacted GitHub tokens masked
C5.11 Database URLs Redact connection strings postgresql://user:pass@host Password portion redacted DB creds masked
C5.12 Config files Handle sensitive configs .npmrc, .gitconfig with tokens Tokens redacted Config tokens masked
C5.13 Null byte in path Prevent truncation file.txt\x00/etc/passwd Rejected Null byte blocked
C5.14 Protocol injection Prevent scheme abuse file:///etc/passwd Rejected Protocol blocked

C6. Error Handling Tests

Test ID Test Name Motivation Error Scenario Expected Behavior Success Criteria
C6.1 File not found Handle missing file Non-existent path Clear error + suggestions Helpful error
C6.2 Directory path Handle wrong type Directory instead of file Error suggesting list tool Appropriate guidance
C6.3 Permission denied Handle access error Read-protected file Clear permission error Access error clear
C6.4 Line range overflow Handle invalid range endLine > totalLines Adjusted range + warning Graceful adjustment
C6.5 Pattern not found Handle no match matchString not in file Empty match result + hints Clear no-match
C6.6 Invalid encoding Handle charset issues Non-UTF8 file Warning + best-effort decode Graceful handling
C6.7 Bulk partial failure Isolate failures Some files inaccessible Successful reads return Partial success
C6.8 Empty file Handle edge case 0-byte file Clear empty indication Empty handled

Test Suite D: File Finding Tools

Claude Code Tool: Glob

Cursor Tool: glob_file_search

Octocode Tool: localFindFiles


D1. Context Quality Tests

Test ID Test Name Motivation Claude Code Glob Behavior Cursor glob_file_search Behavior Octocode Expected Behavior Success Criteria
D1.1 Basic file find Verify output richness Array of file paths sorted by mtime Array of file paths only Entries with type, size, permissions, modified Rich metadata
D1.2 Modified time filter Verify time-based search Not supported Not supported modifiedWithin, modifiedBefore params Time filtering works
D1.3 Size filter Verify size-based search Not supported Not supported sizeGreater, sizeLess params Size filtering works
D1.4 Multiple patterns Verify OR logic Multiple calls required Multiple calls required names array with OR logic Single call for multi
D1.5 Depth control Verify recursion limit Implicit via pattern Unlimited or fixed maxDepth, minDepth params Depth controllable
D1.6 Type filtering Verify type-based search Files only Files only type: f (file), d (dir), l (link) Type filtering works
D1.7 Permission filter Verify access-based search Not supported Not supported executable, readable, writable Permission filtering works
D1.8 Regex pattern support Verify regex names Glob patterns only Glob only regex param for regex names Regex enabled
D1.9 Path pattern support Verify path matching Glob patterns (e.g., **/*.ts) Basic glob pathPattern for full path matching Path patterns work
D1.10 Empty detection Verify empty file find Not supported Not supported empty flag for zero-byte files Empty files findable

D2. Efficiency Tests

Test ID Test Name Motivation Test Scenario Success Criteria
D2.1 Large filesystem Performance at scale 50,000 files in tree Response < 5s
D2.2 Bulk find queries Multiple finds at once 5 patterns vs sequential Bulk >= 2x faster
D2.3 Filtered vs unfiltered Filter performance With vs without filters Filtered faster
D2.4 Deep vs shallow Depth impact maxDepth=1 vs maxDepth=10 Linear scaling
D2.5 Pattern complexity Regex cost Simple vs complex regex Reasonable overhead

D3. Output Quality Tests

Test ID Test Name Motivation Claude Code Glob Output Cursor glob_file_search Output Octocode Expected Output Success Criteria
D3.1 Structured response Verify format Array of paths sorted by mtime Array of paths Structured entries with fields Rich structure
D3.2 Metadata inclusion Verify file info Paths only None Size, type, modified, permissions Metadata included
D3.3 Sorting support Verify ordering Modification time only Modification time only sortBy option Flexible sorting
D3.4 Pagination info Verify continuation None None Page, total, hasMore Continuable
D3.5 Hints generation Verify guidance None None Hints for refinement Agent guidance
D3.6 Match count Verify totals Array length only Array length only totalFiles, distribution Stats included

D4. Token Safety Tests

Test ID Test Name Motivation Risk Without Control Octocode Mitigation Success Criteria
D4.1 Many matches Prevent overflow 10K files returned filesPerPage pagination Output bounded
D4.2 Detail mode scaling Control verbosity All metadata for all files details flag toggles Controllable detail
D4.3 Pre-find estimation Early rejection Large result generated Estimate count first Early detection
D4.4 Limit enforcement Hard cap on results Unlimited finds limit param enforced Results capped
D4.5 Summary mode Minimal output Full listing required Count-only option Ultra-light option

D5. Security Tests

Test ID Test Name Motivation Attack Vector Expected Behavior Success Criteria
D5.1 Path traversal Prevent escape path: "../../" Rejected Traversal blocked
D5.2 Absolute outside Prevent arbitrary /etc or /home Rejected Outside blocked
D5.3 Symlink escape Prevent link attack Symlink directory Resolved and blocked Symlink validated
D5.4 Sensitive patterns Prevent leaks name: "*.env*" Excluded from results Sensitive excluded
D5.5 Hidden directories Prevent exposure .ssh, .aws Excluded by default Hidden protected
D5.6 Regex DoS Prevent ReDoS Catastrophic backtracking regex Timeout or rejection ReDoS prevented

D6. Error Handling Tests

Test ID Test Name Motivation Error Scenario Expected Behavior Success Criteria
D6.1 Non-existent path Handle missing Path doesn't exist Clear error + suggestion Helpful error
D6.2 Invalid pattern Handle bad glob Malformed glob pattern Parse error message Debuggable error
D6.3 Permission denied Handle access Unreadable directory Skip with warning Graceful skip
D6.4 No matches Handle empty Pattern matches nothing Empty result + hints Clear empty state
D6.5 Bulk partial failure Isolate failures Some paths inaccessible Successful queries return Partial success
D6.6 Invalid time format Handle bad input modifiedWithin: "invalid" Parse error with format hint Helpful parse error

Test Suite E: Integration & Workflow Tests


E1. End-to-End Workflows

Test ID Test Name Motivation Workflow Steps Success Criteria
E1.1 Code exploration flow Validate full workflow Find files -> Search patterns -> Read matches All steps succeed with context
E1.2 Bug investigation flow Validate debugging Search error -> Find related files -> Read context Investigation enabled
E1.3 Refactoring preparation Validate refactor support Find usages -> List structure -> Read implementations Refactor info complete
E1.4 Security audit flow Validate audit Find sensitive files -> Search patterns -> Read for secrets Audit enabled with redaction
E1.5 Documentation discovery Validate doc search Find markdown -> Search headings -> Read sections Doc discovery works
E1.6 Dependency analysis Validate import tracing Search imports -> Find packages -> Read configs Dep analysis enabled
E1.7 Test coverage flow Validate test search Find test files -> Search assertions -> Read test bodies Test analysis works
E1.8 Configuration audit Validate config discovery Find configs -> Read settings -> Search for values Config audit enabled

E2. Token Budget Compliance

Test ID Test Name Motivation Budget Constraint Workflow Success Criteria
E2.1 Single tool budget One tool fits budget 10K tokens Complex search Output < 10K tokens
E2.2 Session budget Full session fits 25K tokens Find + Search + Read Total < 25K tokens
E2.3 Bulk operation budget Bulk within limits 15K tokens 5 concurrent searches Output < 15K tokens
E2.4 Iterative exploration Multiple rounds fit 50K tokens 5 exploration rounds Total < 50K tokens
E2.5 Pagination continuation Continued exploration 10K per page Multi-page results Each page < 10K tokens

E3. Research Context Preservation

Test ID Test Name Motivation Context Element Success Criteria
E3.1 Goal tracking Verify goal persists mainResearchGoal Goal in all responses
E3.2 Sub-goal tracking Verify sub-goal persists researchGoal Sub-goal in all responses
E3.3 Reasoning tracking Verify reasoning persists reasoning Reasoning in all responses
E3.4 Cross-tool context Context across tools Research context Context maintained across tool switches
E3.5 Hint relevance Verify contextual hints Hints match research goal Hints aligned with goals

Test Suite F: Performance Benchmarks


F1. Response Time Targets

Test ID Operation Tool Input Scale Target P50 Target P95 Target Max
F1.1 Pattern search localSearchCode 10K files 300ms 800ms 2s
F1.2 Directory list localViewStructure 1K entries 200ms 400ms 1s
F1.3 File read localGetFileContent 1MB file 100ms 200ms 500ms
F1.4 File find localFindFiles 50K files 1s 3s 5s
F1.5 Bulk search (5) localSearchCode 5 queries 500ms 1.5s 3s
F1.6 Bulk read (5) localGetFileContent 5 files 200ms 500ms 1s

F2. Memory Usage Targets

Test ID Operation Tool Input Scale Target Peak Target Average
F2.1 Large search localSearchCode 100MB directory 100MB 50MB
F2.2 Deep recursion localViewStructure depth=5 tree 50MB 25MB
F2.3 Large file localGetFileContent 10MB file 30MB 15MB
F2.4 Many files localFindFiles 100K files 100MB 50MB
F2.5 Bulk operations All tools 5 concurrent 200MB 100MB

F3. Throughput Targets

Test ID Operation Tool Target Throughput
F3.1 Sequential searches localSearchCode 20 queries/second
F3.2 File reads localGetFileContent 50 files/second
F3.3 Directory listings localViewStructure 30 directories/second
F3.4 File finds localFindFiles 10 queries/second

Test Suite G: Edge Cases & Stress Tests


G1. Edge Cases

Test ID Test Name Motivation Edge Case Expected Behavior Success Criteria
G1.1 Empty file Zero-byte file handling 0 bytes file Read succeeds, empty content No error
G1.2 Empty directory Empty directory handling 0 files/dirs List succeeds, empty results No error
G1.3 Single character Minimal content 1 byte file Read succeeds No error
G1.4 Very long filename Filename edge 255 character name Handled correctly No truncation error
G1.5 Unicode filename International names Chinese filename Handled correctly Unicode works
G1.6 Spaces in path Special characters path with spaces/file.txt Handled correctly Spaces work
G1.7 Special characters Path edge cases file[1](2){3}.txt Escaped correctly Special chars work
G1.8 No extension Extension-less file Makefile, Dockerfile Typed correctly No extension works
G1.9 Multiple extensions Complex extension file.test.spec.ts Correct extension detected Multi-ext works
G1.10 Dotfile Hidden files .gitignore Correctly typed Dotfiles work

G2. Stress Tests

Test ID Test Name Motivation Stress Factor Success Criteria
G2.1 Maximum file count Scale limits 100K files in search path Completes without crash
G2.2 Maximum file size Large file handling 100MB file Handled with pagination
G2.3 Maximum directory depth Deep nesting 50 levels deep Completes with depth limit
G2.4 Maximum concurrent queries Parallelism limits 10 concurrent bulk queries All complete
G2.5 Rapid sequential calls Rate handling 100 calls in 10 seconds All succeed
G2.6 Mixed operations Combined load Search + List + Read + Find concurrent All succeed

Test Suite H: Large File Handling

Focus: Character/line pagination, context-aware reading, memory efficiency for large files

Applicable Tools: localGetFileContent (Octocode) vs Read (Claude Code) vs read_file (Cursor)


H1. Large File Detection & Awareness

Test ID Test Name Motivation File Size Claude Code Read Cursor read_file Octocode Expected Success Criteria
H1.1 File size detection Awareness before reading 10MB file Reads up to 2000 lines Attempts full read Reports fileSize before content Size known upfront
H1.2 Line count detection Know total lines 100K lines No line count No line count Reports totalLines Line count available
H1.3 Character count detection Know total chars 5M chars No char count No char count Reports totalChars Char count available
H1.4 Large file warning Proactive guidance >1MB file Truncates at 2000 lines No warning Warning + pagination hints User informed
H1.5 Giant file rejection Prevent catastrophic read 500MB file Truncates output May crash/timeout Clear error + guidance Safe rejection
H1.6 Memory estimation Predict resource usage 50MB file No estimation No estimation Estimated memory in metadata Resource awareness

H2. Character-Based Pagination

Test ID Test Name Motivation Scenario Claude Code Behavior Cursor Behavior Octocode Expected Success Criteria
H2.1 First page read Initial chunk 10MB file, charLength=10000 Reads 2000 lines (no char control) Full file or error First 10K chars + pagination Bounded first page
H2.2 Middle page read Continue reading charOffset=50000, charLength=10000 Uses offset/limit (line-based) Not supported Chars 50K-60K returned Mid-file access
H2.3 Last page detection Know when done Near end of file No indication No indication hasMore: false when complete End detection
H2.4 Page boundary alignment Clean breaks charOffset=9999 Line-based, not char-based May break mid-word Attempts word boundary Clean breaks
H2.5 UTF-8 boundary safety Handle multi-byte Offset lands mid-character N/A (line-based) May corrupt Adjusts to char boundary Safe UTF-8
H2.6 Continuation hint Guide next page After any page No hint No hint nextCharOffset provided Easy continuation
H2.7 Page size limits Enforce max page charLength=1000000 2000 line limit Allowed Max limit enforced Page bounded
H2.8 Zero offset handling Start from beginning charOffset=0 Works (offset=0) Works Works with metadata Clean start
H2.9 Negative offset rejection Invalid input charOffset=-100 Undefined Undefined Clear error Invalid rejected
H2.10 Beyond-file offset Past EOF charOffset > fileSize Undefined Undefined Empty with warning EOF handled

H3. Line-Based Pagination

Test ID Test Name Motivation Scenario Claude Code Behavior Cursor Behavior Octocode Expected Success Criteria
H3.1 Line range read Specific lines startLine=100, endLine=200 offset/limit params offset/limit params Lines 100-200 with numbers Exact line range
H3.2 First N lines File header startLine=1, endLine=50 offset=0, limit=50 offset=0, limit=50 First 50 lines Header access
H3.3 Last N lines File footer Last 100 lines Not directly supported Not supported Tail functionality Footer access
H3.4 Single line read Precise line startLine=500, endLine=500 offset + limit=1 Manual calculation Exactly line 500 Single line
H3.5 Line overflow handling Beyond total endLine=1000000 Reads up to EOF May error Adjusts to totalLines + warning Graceful adjustment
H3.6 Line number annotation Reference lines Any range cat -n format includes numbers No numbers Lines prefixed with numbers Numbers present
H3.7 Empty line handling Preserve blanks Range with empty lines Included Included Empty lines preserved Blanks kept
H3.8 Very long lines Single huge line 1MB single line Truncated at 2000 chars Full line Line truncated with marker Line bounded
H3.9 Mixed line endings CR/LF/CRLF Mixed endings May miscalculate May miscalculate Correct line count Endings handled
H3.10 Continuation metadata Next range info After any range None None nextStartLine provided Easy continuation

H4. Context-Aware Large File Reading

Test ID Test Name Motivation Scenario Claude Code Behavior Cursor Behavior Octocode Expected Success Criteria
H4.1 Pattern in large file Find specific content matchString in 50MB file Use Grep tool instead Not supported Finds pattern + context Pattern found
H4.2 Pattern context lines Surrounding context matchStringContextLines=10 Grep -C flag Not supported 10 lines before/after Context included
H4.3 Multiple pattern matches All occurrences Pattern appears 100 times Grep returns all N/A First N matches + count Matches bounded
H4.4 Pattern near file start Beginning match Match in first 100 lines Grep handles N/A Correct context, no negative lines Start bounded
H4.5 Pattern near file end Ending match Match in last 100 lines Grep handles N/A Correct context, no overflow End bounded
H4.6 No match in large file Handle missing Pattern not in 50MB file Grep returns empty N/A status: empty + hints Clear no-match
H4.7 Regex in large file Complex pattern Regex matchString Grep supports regex Not supported Regex matching works Regex enabled
H4.8 Case-insensitive match Flexible matching matchStringCaseSensitive=false Grep -i flag N/A Case-insensitive search Case flexible
H4.9 Context overlap handling Adjacent matches Two matches 5 lines apart Grep shows all N/A Merged context, no duplication Context merged
H4.10 Match distribution info Where are matches Pattern with many matches Grep count mode N/A Distribution by line ranges Distribution shown

H5. Large File Token Efficiency

Test ID Test Name Motivation Scenario Token Risk Octocode Mitigation Success Criteria
H5.1 JSON minification Reduce JSON size 5MB JSON file 5M tokens Minified output 30-50% reduction
H5.2 YAML minification Reduce YAML size 2MB YAML file 2M tokens Minified output 20-40% reduction
H5.3 Code minification Reduce code size 1MB TypeScript 1M tokens Whitespace normalized 10-20% reduction
H5.4 Minification toggle Control minification minified=false Full output Full formatting preserved Toggle works
H5.5 Selective minification File type aware Mixed file types Varies Type-appropriate minification Type-aware
H5.6 Token estimation Budget awareness Any large file Unknown Estimated tokens in metadata Estimation provided
H5.7 Budget warning Proactive alert Output > 10K tokens No warning Warning before output Early warning
H5.8 Chunked JSON Split large JSON 50MB JSON Massive Split by structure Structure-aware split
H5.9 Log file handling Handle log patterns 100MB log file Massive Intelligent sampling Sampled output
H5.10 Repeated content detection Optimize redundancy Highly repetitive Bloated Repetition collapsed Redundancy removed

H6. Large File Performance

Test ID Test Name Motivation File Size Operation Target Time Target Memory
H6.1 Size detection only Fast metadata 100MB file Get size/lines < 50ms < 5MB
H6.2 First page read Initial access 100MB file First 10K chars < 100ms < 20MB
H6.3 Pattern search Find in large 50MB file matchString < 500ms < 100MB
H6.4 Line range extraction Specific lines 100MB file 100 lines mid-file < 200ms < 30MB
H6.5 Full minification Compress output 10MB JSON Read + minify < 1s < 50MB
H6.6 Multiple patterns Bulk match 50MB file, 5 patterns All patterns < 2s < 150MB
H6.7 Streaming read Continuous pages 100MB file, 10 pages Sequential reads < 3s total < 50MB peak
H6.8 Concurrent large files Parallel access 5x 20MB files Bulk read < 2s < 200MB

H7. Large File Edge Cases

Test ID Test Name Motivation Edge Case Expected Behavior Success Criteria
H7.1 Single line file No newlines 10MB single line Handled with truncation No hang
H7.2 Binary-like content Non-text patterns File with binary segments Detection + warning Clean handling
H7.3 Empty large file Sparse file Reported 1GB, actually empty Correct detection Size accurate
H7.4 Unicode heavy file Multi-byte heavy 10MB emoji file Correct char count Chars accurate
H7.5 Growing file File being written File changing during read Warning + snapshot Consistent read
H7.6 Compressed content Gzip/encoded .gz or base64 content Warning, no decode No expansion
H7.7 Null bytes Binary markers Text with null bytes Filtered or warned Clean output
H7.8 Extremely long lines Code dumps 100K char single line Line truncated Line bounded
H7.9 Millions of lines Line count scale 10M lines, 500MB Line count correct Count accurate
H7.10 Nested JSON depth Deep structure 100 levels nested Handles gracefully No stack overflow

Test Suite I: Large Repository Handling

Focus: Fast searching at scale, quality results in monorepos, resource efficiency

Applicable Tools: All local tools (localSearchCode, localViewStructure, localFindFiles, localGetFileContent)


I1. Large Repository Detection & Awareness

Test ID Test Name Motivation Repository Scale Claude Code Behavior Cursor Behavior Octocode Expected Success Criteria
I1.1 Repository size detection Know scale upfront 100K files No awareness No awareness File count in metadata Scale known
I1.2 Directory depth detection Know nesting 50 levels deep No awareness No awareness Max depth reported Depth known
I1.3 Total size estimation Know data volume 5GB total No awareness No awareness Total size in metadata Volume known
I1.4 Monorepo detection Identify structure packages/, apps/ No detection No detection Monorepo pattern detected Structure detected
I1.5 Language distribution Know tech stack Mixed languages No info No info Language breakdown Stack known
I1.6 Hot path identification Frequently used Build artifacts heavy No info No info Size distribution by path Hot paths known

I2. Large Repository Search Performance

Test ID Test Name Motivation Scale Pattern Target Time Success Criteria
I2.1 Simple pattern search Baseline speed 50K files function < 2s Fast baseline
I2.2 Complex regex search Regex overhead 50K files Complex regex < 5s Acceptable regex
I2.3 Rare pattern search Needle in haystack 100K files Unique string < 3s Fast sparse match
I2.4 Common pattern search High match count 50K files import < 3s Fast common match
I2.5 Multi-pattern bulk Parallel patterns 50K files, 5 patterns Varied < 5s total Efficient bulk
I2.6 Incremental search Early termination 100K files First 10 matches < 500ms Fast early stop
I2.7 Filtered search With type filter 50K files, *.ts only Any pattern < 1.5s Filter speedup
I2.8 Path-scoped search Subdirectory only 100K files, src/ only Any pattern < 1s Scope speedup
I2.9 Exclusion patterns Skip directories 100K files, skip node_modules Any pattern < 2s Exclusion works
I2.10 Cold vs warm search Cache benefit 50K files Same pattern twice 2nd < 50% of 1st Cache effective

I3. Large Repository Search Quality

Test ID Test Name Motivation Scenario Claude Code Grep Cursor grep Octocode Expected Success Criteria
I3.1 Result relevance ranking Most useful first Common pattern File order Random order Relevance-based hints Ranked results
I3.2 Match distribution info Where are matches Pattern across repo File list Flat list Distribution by path/file Distribution shown
I3.3 File importance hints Guide exploration Many matches No guidance No guidance Hints for key files Key files highlighted
I3.4 Duplicate detection Avoid redundancy Copied files All shown All shown Duplicates noted Duplicates flagged
I3.5 Test vs source distinction Code vs test Pattern in both Mixed together Mixed together Source/test separation hint Distinction clear
I3.6 Generated file detection Skip generated Build output matches Included Included Generated files flagged Generated noted
I3.7 Stale file detection Freshness info Old vs new matches No dates No dates Modified dates included Freshness visible
I3.8 Context completeness Useful context Matches need context Context via -A/-B/-C Raw lines Smart context boundaries Context useful
I3.9 Cross-reference hints Related searches Pattern found No suggestions No suggestions Related pattern hints Cross-refs provided
I3.10 Confidence scoring Match quality Varied matches All equal All equal Match confidence indicated Confidence shown

I4. Large Repository Directory Listing

Test ID Test Name Motivation Scale Claude Code Bash(ls)/Glob Cursor list_dir Octocode Expected Success Criteria
I4.1 Root overview Quick orientation 50K file repo ls output or glob Full flat list Summary + top entries Quick overview
I4.2 Deep tree exploration Navigate structure depth=3 Multiple calls or ls -R Multiple calls Single call, paginated Single call works
I4.3 Monorepo packages List packages packages/* Manual navigation Manual navigation Package detection + list Packages listed
I4.4 Size-sorted listing Find large paths Large repo ls -lS No sorting Sorted by size desc Large paths first
I4.5 Recently modified Find active areas Large repo ls -lt No time sort Sorted by modified Recent first
I4.6 Empty directory skip Hide empty Sparse structure All shown All shown Empty dirs noted/hidden Empty handled
I4.7 Symlink handling Handle links Many symlinks May follow May follow Links noted, not followed deep Links safe
I4.8 Permission issues Handle restricted Mixed permissions May error May error Skipped with warning Restricted handled
I4.9 Very deep paths Handle nesting 100 levels May timeout May timeout Depth-limited with hint Depth bounded
I4.10 Wide directories Many children 10K files in one dir All returned All returned Paginated Wide paginated

I5. Large Repository File Finding

Test ID Test Name Motivation Scale Pattern Target Time Success Criteria
I5.1 Find by name Specific file 100K files package.json < 1s Fast name find
I5.2 Find by extension Type search 100K files *.ts < 2s Fast type find
I5.3 Find by size Large files 100K files > 1MB < 3s Fast size find
I5.4 Find by date Recent files 100K files modified < 7d < 3s Fast date find
I5.5 Combined filters Multi-criteria 100K files *.ts, > 10KB, recent < 3s Fast combined
I5.6 Find in subtree Scoped find 100K files, src/ Any pattern < 1s Scope speedup
I5.7 Find with exclusions Skip paths 100K files Skip node_modules, dist < 2s Exclusion works
I5.8 Regex filename Complex names 50K files Regex pattern < 3s Regex works
I5.9 Bulk find patterns Multiple patterns 100K files, 5 patterns Varied < 5s Bulk efficient
I5.10 Find empty files Zero-byte 100K files empty=true < 2s Empty found

I6. Large Repository Resource Management

Test ID Test Name Motivation Scale Operation Target Memory Target CPU
I6.1 Memory during search RAM efficiency 100K files Full search < 200MB peak < 50% avg
I6.2 Memory during listing RAM for tree 50K entries depth=3 list < 100MB peak < 30% avg
I6.3 Memory during find RAM for find 100K files Pattern find < 150MB peak < 40% avg
I6.4 CPU during search CPU efficiency 100K files Complex search N/A < 80% peak
I6.5 Concurrent operations Parallel efficiency 50K files 5 concurrent ops < 300MB total Balanced
I6.6 Idle resource release Cleanup after ops Any After completion < 20MB retained Cleanup works
I6.7 Rate limiting Prevent overload Rapid requests 100 ops/minute Stable Rate controlled
I6.8 Graceful degradation Under load High concurrent 20 parallel All complete No crashes

I7. Large Repository Token Budget

Test ID Test Name Motivation Scale Operation Token Risk Mitigation Success Criteria
I7.1 Search result cap Prevent overflow 10K matches Pattern search 100K tokens Pagination Output < 10K tokens
I7.2 Directory tree cap Prevent tree dump 50K entries depth=3 list 200K tokens Pagination Output < 10K tokens
I7.3 Find results cap Prevent file dump 10K matches File find 50K tokens Pagination Output < 10K tokens
I7.4 Bulk operation cap Combined limit Mixed 5 operations 500K tokens Per-op limits Total < 25K tokens
I7.5 Summary mode Minimal output Large repo Any operation High Summary only Output < 2K tokens
I7.6 Count-only mode Stats without data Large repo Search/find High Counts only Output < 500 tokens
I7.7 Progressive disclosure Expand on demand Large results Any Varies Start minimal Controllable expansion
I7.8 Hint-based exploration Guide not dump Large repo Any Varies Hints over data Hints actionable

I8. Large Repository Monorepo Support

Test ID Test Name Motivation Monorepo Type Scenario Expected Behavior Success Criteria
I8.1 Package detection Identify packages npm workspaces List packages Packages enumerated Packages found
I8.2 Cross-package search Search all packages Lerna/Yarn Pattern search Results grouped by package Grouped results
I8.3 Package-scoped search Single package Any Search in packages/foo Scoped correctly Scope works
I8.4 Shared dependencies Common node_modules npm workspaces Find shared Hoisted deps detected Hoisting aware
I8.5 Package relationships Dep graph hints Any List packages Inter-package deps noted Deps shown
I8.6 Root vs package config Config hierarchy Any Find configs Root + package configs Hierarchy shown
I8.7 Build output detection Skip dist/build Any Search code Generated excluded Generated skipped
I8.8 Test file organization Test location Various Find tests Tests grouped by package Tests organized
I8.9 Changelog navigation Version history Any Find changelogs Package changelogs listed Changelogs found
I8.10 Documentation search Doc discovery Any Find docs Package docs listed Docs found

I9. Large Repository Error Resilience

Test ID Test Name Motivation Error Scenario Expected Behavior Success Criteria
I9.1 Partial permission Some files restricted Mixed permissions Accessible files returned Partial success
I9.2 Corrupted files Bad files in repo Unreadable files Skipped with warning Corruption handled
I9.3 Missing symlinks Broken symlinks Dangling links Skipped with warning Broken links handled
I9.4 Very long paths Path length limits 500+ char paths Truncated or skipped Long paths handled
I9.5 Special characters Unusual filenames Unicode/special chars Escaped correctly Special chars handled
I9.6 Concurrent modification Files changing Active development Warning + snapshot Concurrent handled
I9.7 Disk space issues Low disk During operation Graceful error Disk errors handled
I9.8 Network paths Mounted drives NFS/SMB paths Warning about latency Network paths noted
I9.9 Large binary files Non-text in repo Binary blobs Skipped in search Binaries handled
I9.10 Recursive symlinks Symlink loops Circular links Loop detection Loops prevented

Acceptance Criteria Summary


P0 - Must Pass (Critical)

Category Test IDs Requirement
Security - Path Validation A5.1-A5.8, B5.1-B5.4, C5.1-C5.3, D5.1-D5.3 All path traversal attacks blocked
Security - Secret Detection C5.4-C5.14 All credential types redacted
Token Safety - Pagination A4.1, A4.4, B4.1, C4.1, D4.1, H2.1-H2.6, I7.1-I7.4 Large outputs paginated
Token Safety - Limits A4.2, A4.5, B4.3, C4.2-C4.4, D4.4, H5.6-H5.7 Output size bounded
Large File Safety H1.4-H1.5, H2.7, H3.8, H4.3 Large file access controlled
Large Repo Safety I6.1-I6.4, I7.1-I7.8 Large repo resource bounded
Error Handling - Basic A6.1-A6.3, B6.1-B6.3, C6.1-C6.3, D6.1-D6.3, I9.1-I9.5 Graceful error messages

P1 - Should Pass (High Priority)

Category Test IDs Requirement
Context Quality A1.1-A1.5, B1.1-B1.4, C1.1-C1.6, D1.1-D1.4 Structured, rich context
Output Quality A3.1-A3.5, B3.1-B3.4, C3.1-C3.5, D3.1-D3.4 Consistent, useful output
Efficiency - Bulk A2.2, B2.3, C2.3, D2.2 Bulk faster than sequential
Performance F1.1-F1.6, H6.1-H6.6, I2.1-I2.7 Response times within targets
Hint Generation A1.4, A3.3, B3.4, C3.7, D3.5 Actionable hints provided
Large File Pagination H2.1-H2.10, H3.1-H3.10 Char/line pagination works
Large File Context H4.1-H4.10 Context-aware reading
Large Repo Search I2.1-I2.10, I3.1-I3.10 Fast, quality search

P2 - Nice to Have (Medium Priority)

Category Test IDs Requirement
Research Context A1.9, E3.1-E3.5 Research goals preserved
Advanced Filtering A1.7, B1.3-B1.5, C1.9-C1.10, D1.2-D1.7 Advanced filter options
Integration Workflows E1.1-E1.8 End-to-end workflows succeed
Edge Cases G1.1-G1.10, H7.1-H7.10 Edge cases handled
Memory Efficiency F2.1-F2.5, I6.1-I6.8 Memory within targets
Large File Efficiency H5.1-H5.10, H6.1-H6.8 Token-efficient file reading
Monorepo Support I8.1-I8.10 Monorepo patterns detected

P3 - Bonus (Low Priority)

Category Test IDs Requirement
Stress Tests G2.1-G2.6 Stress tests pass
Throughput F3.1-F3.4 Throughput targets met
Fallback Mechanisms A2.4 Graceful degradation
Large Repo Resilience I9.1-I9.10 Error resilience at scale
Advanced Large File H1.1-H1.6 Proactive large file detection

Test Execution Plan


Phase 1: Security & Critical

  • Execute all P0 tests
  • All security tests must pass
  • All token safety tests must pass
  • Large file/repo safety controls validated

Phase 2: Quality & Performance

  • Execute all P1 tests
  • Context quality validated
  • Performance benchmarks met
  • Large file pagination working
  • Large repo search quality confirmed

Phase 3: Integration & Edge Cases

  • Execute all P2 tests
  • Integration workflows validated
  • Edge cases handled
  • Monorepo support validated
  • Large file efficiency optimized

Phase 4: Stress & Optimization

  • Execute all P3 tests
  • Stress tests validated
  • Large repo resilience confirmed
  • Final optimization

Expected Outcomes

Dimension Claude Code Built-in Tools Cursor Built-in Tools Octocode MCP Tools Winner
Context Quality Text output with line numbers Raw text output Structured JSON/YAML with metadata Octocode
Efficiency Sequential calls Sequential only Bulk parallel operations Octocode
Output Quality Text with basic formatting Plain text Structured with hints Octocode
Token Safety 2000 line limit, truncation Unbounded output Paginated with limits Octocode
Security Basic OS-level Basic OS-level Multi-layer validation + secret redaction Octocode
Error Handling Generic errors Generic errors Contextual errors with guidance Octocode
Research Context None None Goal/reasoning tracking Octocode
Large File Handling Line-based offset/limit Full dump or manual ranges Char/line pagination + context search Octocode
Large Repo Performance Linear slowdown Linear slowdown Optimized search + smart pagination Octocode
Monorepo Support No awareness No awareness Package detection + scoped operations Octocode

Test Suite Summary

Suite Focus Area Test Count Tools Compared
A Search Tools 50+ Grep/grep vs localSearchCode
B Directory Listing 32 Bash(ls)/Glob/list_dir vs localViewStructure
C File Content 50+ Read/read_file vs localGetFileContent
D File Finding 36 Glob/glob_file_search vs localFindFiles
E Integration Workflows 16 All tools combined
F Performance Benchmarks 20 All tools benchmarked
G Edge Cases & Stress 16 All tools stress tested
H Large File Handling 70+ Read/read_file vs localGetFileContent
I Large Repository 90+ All tools at scale

ACTUAL TEST RESULTS: Claude Code vs Octocode MCP Local Tools

Execution Date: 2026-01-01 Repository: Linux Kernel (100K+ files) Model: Claude Opus 4.5


Test 1: Code Search Comparison

Query: Find mutex_lock in kernel/sched/ (C files only, with context)

Metric Octocode localSearchCode Claude Code Grep Winner
Files Found 8 files 8 files (via -C context) Tie
Total Matches 31 matches 31 matches (verified) Tie
Response Structure Structured JSON with byte/char offsets Plain text with grep-style output Octocode
Metadata Richness Line, column, byteOffset, charOffset, matchCount per file Line numbers only Octocode
Pagination Built-in (page 1/1, filesPerPage, hasMore) head_limit/offset params Octocode
Context Display Truncated match values with byte boundaries Full context with -C flag Claude Code
Hints/Guidance 18 actionable hints for next steps None Octocode
Research Context researchGoal, reasoning preserved None Octocode

Winner: Octocode - Richer metadata, pagination info, and actionable hints. Claude Code has simpler output but lacks structured data.


Test 2: Directory Structure Exploration

Query: Explore drivers/net/ at depth 2

Metric Octocode localViewStructure Claude Code Glob Winner
Total Files 595 files (reported) ~100 files (truncated) Octocode
Total Directories 191 directories (reported) Not provided Octocode
Total Size 11,164,674 bytes (reported) Not provided Octocode
Output Format Tree-style with [FILE] and [DIR] markers Flat file path list Octocode
File Sizes Per-file sizes (e.g., "88.9KB") Not provided Octocode
Pagination Page 1/40, entriesPerPage=20 Truncated with warning Octocode
Sorting Controllable (name, size, time) Modification time only Octocode
Depth Control depth param (1-5) Implicit via glob pattern Octocode

Winner: Octocode - Comprehensive structure view with statistics. Claude Code Glob is simpler but lacks metadata and truncates output.


Test 3: File Content Reading

Query: Read include/linux/sched.h focusing on struct task_struct

Metric Octocode localGetFileContent Claude Code Read Winner
Pattern Matching matchString with context extraction Not supported (line offset only) Octocode
Content Returned 11,889 chars with omission markers First 100 lines (as requested) Octocode
Total Lines 2,444 (reported) Not reported Octocode
Match Ranges 13 match ranges identified Not applicable Octocode
Token Efficiency Smart extraction with ...N lines omitted... Full lines returned Octocode
Warnings "Pattern matched 94 lines. Truncated to first 50" None Octocode
Line Numbers Included with content Cat-style N-> format Tie
Multimodal Text only Images, PDFs, Notebooks Claude Code

Winner: Octocode - Smart pattern extraction with context is far more efficient. Claude Code wins for multimodal files (images, PDFs).


Test 4: File Finding

Query: Find .c files in net/ directory

Metric Octocode localFindFiles Claude Code Glob Winner
Files Found 1,000 files (reported total) ~100 files (truncated) Octocode
Metadata Per File path, type, size, permissions, modified date path only Octocode
Pagination Page 1/67, filesPerPage=15 Truncated with warning Octocode
Time Filtering modifiedWithin, modifiedBefore Not supported Octocode
Size Filtering sizeGreater, sizeLess Not supported Octocode
Sorting Modification time (with options) Modification time Tie
Permission Filtering executable, readable, writable Not supported Octocode
Details Toggle details flag Not applicable Octocode

Winner: Octocode - Rich metadata and filtering options. Claude Code Glob is simpler but lacks forensic capabilities.


Overall Summary

Test Category Claude Code Tool Octocode Tool Winner Margin
Code Search Grep localSearchCode Octocode Large
Directory Exploration Glob/Bash localViewStructure Octocode Large
File Content Reading Read localGetFileContent Octocode Large
File Finding Glob localFindFiles Octocode Large

Claude Code Exclusive Advantages

Feature Advantage
Multimodal Files Native support for images, PDFs, Jupyter notebooks
Agent Delegation Task tool with Explore, Plan, and General-purpose agents
Shell Access Full Bash for git, build, arbitrary commands
Multiline Regex Cross-line pattern matching with multiline: true
Simplicity Lower learning curve for simple queries

Octocode Exclusive Advantages

Feature Advantage
Structured Output JSON with byte/char offsets, match metadata
Pagination Built-in pagination with page info and hasMore
Research Context researchGoal, reasoning, mainResearchGoal tracking
Actionable Hints Dynamic hints for next exploration steps
Token Efficiency Smart content extraction with omission markers
Rich Metadata File sizes, permissions, timestamps, match counts
Forensic Filtering Time-based, size-based, permission-based file discovery
Bulk Operations Multiple queries in single call with parallel execution

Recommendations

Use Case Recommended Tool Reason
Quick code search Claude Code Grep Simple, fast, familiar output
Deep code research Octocode localSearchCode Rich metadata, hints, pagination
Codebase structure overview Octocode localViewStructure Summary stats, tree view, sizes
Reading specific file sections Octocode localGetFileContent Smart pattern extraction
Reading images/PDFs Claude Code Read Native multimodal support
Finding files by metadata Octocode localFindFiles Time/size/permission filters
Finding files by pattern Either Both effective for glob patterns
Complex multi-step research Claude Code Task (Explore) Agent delegation is powerful
Shell operations Claude Code Bash Full shell access
Token-constrained contexts Octocode tools Better pagination and efficiency

Test Plan Version: 3.0 (Unified Edition) Last Updated: 2026-01-01 Total Test Cases: 380+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment