Created
January 16, 2026 15:44
-
-
Save jwmatthews/c53816176821bede57b3ce3ffbaeb731 to your computer and use it in GitHub Desktop.
Catastrophic Backtracking in Regex Pattern
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ## Performance Issue: Catastrophic Backtracking in Regex Pattern | |
| * Seen in * https://github.com/konveyor/editor-extensions/issues/1182 | |
| ### Problem | |
| The current regex pattern is experiencing severe performance degradation (minutes to hours) when processing large minified files (~10MB): | |
| ```perl | |
| perl -ne 'if (/import[\s\S]*ToolbarChipGroupContent[\s\S]*from[\s\S]*['\'']@patternfly\/react-core['\'']/) { print "$ARGV:$.\n" } close ARGV if eof;' | |
| ``` | |
| ### Root Cause | |
| **Catastrophic backtracking** caused by multiple greedy `[\s\S]*` wildcards in the pattern. | |
| When processing minified files (typically 1-2 giant lines), the regex engine attempts exponentially many match combinations when the pattern doesn't match. With three greedy wildcards, the complexity becomes approximately O(n³) or worse on failure cases. | |
| #### Why Minified Files Amplify This Issue | |
| - Few/no newlines means individual lines can be megabytes long | |
| - Each `[\s\S]*` tries millions of match positions across the entire line | |
| - Multiple wildcards multiply the backtracking attempts exponentially | |
| ### Recommended Solutions | |
| #### Option 1: Non-Greedy Quantifiers (Quick Fix) | |
| Replace `*` with `*?` to reduce backtracking: | |
| ```perl | |
| perl -ne 'if (/import[\s\S]*?ToolbarChipGroupContent[\s\S]*?from[\s\S]*?['\'']@patternfly\/react-core['\'']/) { print "$ARGV:$.\n" } close ARGV if eof;' | |
| ``` | |
| **Expected improvement:** Minutes → Seconds | |
| #### Option 2: Use ripgrep or grep (Recommended) | |
| These tools are optimized for large file searching: | |
| ```bash | |
| rg -n 'import.*ToolbarChipGroupContent.*from.*["\x27]@patternfly/react-core["\x27]' app.js | |
| ``` | |
| **Expected improvement:** Completes in <1 second | |
| #### Option 3: More Specific Pattern (Best Practice) | |
| Use bounded character classes instead of wildcards: | |
| ```perl | |
| perl -ne 'if (/import\s+(?:\{[^}]*)?ToolbarChipGroupContent(?:[^}]*\})?\s+from\s+['\'']@patternfly\/react-core['\'']/) { print "$ARGV:$.\n" } close ARGV if eof;' | |
| ``` | |
| ### Additional Considerations | |
| - Consider using AST-based tools (`ast-grep`, `jscodeshift`) for reliable JavaScript parsing | |
| - If this pattern is used in CI/CD, timeout mechanisms should be implemented | |
| - Alternative: Pre-process files with source maps to work with unminified versions |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment