Skip to content

Instantly share code, notes, and snippets.

@jwmatthews
Created January 16, 2026 15:44
Show Gist options
  • Select an option

  • Save jwmatthews/c53816176821bede57b3ce3ffbaeb731 to your computer and use it in GitHub Desktop.

Select an option

Save jwmatthews/c53816176821bede57b3ce3ffbaeb731 to your computer and use it in GitHub Desktop.
Catastrophic Backtracking in Regex Pattern
## Performance Issue: Catastrophic Backtracking in Regex Pattern
* Seen in * https://github.com/konveyor/editor-extensions/issues/1182
### Problem
The current regex pattern is experiencing severe performance degradation (minutes to hours) when processing large minified files (~10MB):
```perl
perl -ne 'if (/import[\s\S]*ToolbarChipGroupContent[\s\S]*from[\s\S]*['\'']@patternfly\/react-core['\'']/) { print "$ARGV:$.\n" } close ARGV if eof;'
```
### Root Cause
**Catastrophic backtracking** caused by multiple greedy `[\s\S]*` wildcards in the pattern.
When processing minified files (typically 1-2 giant lines), the regex engine attempts exponentially many match combinations when the pattern doesn't match. With three greedy wildcards, the complexity becomes approximately O(n³) or worse on failure cases.
#### Why Minified Files Amplify This Issue
- Few/no newlines means individual lines can be megabytes long
- Each `[\s\S]*` tries millions of match positions across the entire line
- Multiple wildcards multiply the backtracking attempts exponentially
### Recommended Solutions
#### Option 1: Non-Greedy Quantifiers (Quick Fix)
Replace `*` with `*?` to reduce backtracking:
```perl
perl -ne 'if (/import[\s\S]*?ToolbarChipGroupContent[\s\S]*?from[\s\S]*?['\'']@patternfly\/react-core['\'']/) { print "$ARGV:$.\n" } close ARGV if eof;'
```
**Expected improvement:** Minutes → Seconds
#### Option 2: Use ripgrep or grep (Recommended)
These tools are optimized for large file searching:
```bash
rg -n 'import.*ToolbarChipGroupContent.*from.*["\x27]@patternfly/react-core["\x27]' app.js
```
**Expected improvement:** Completes in <1 second
#### Option 3: More Specific Pattern (Best Practice)
Use bounded character classes instead of wildcards:
```perl
perl -ne 'if (/import\s+(?:\{[^}]*)?ToolbarChipGroupContent(?:[^}]*\})?\s+from\s+['\'']@patternfly\/react-core['\'']/) { print "$ARGV:$.\n" } close ARGV if eof;'
```
### Additional Considerations
- Consider using AST-based tools (`ast-grep`, `jscodeshift`) for reliable JavaScript parsing
- If this pattern is used in CI/CD, timeout mechanisms should be implemented
- Alternative: Pre-process files with source maps to work with unminified versions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment