Fix polynomial ReDoS in regexify alternation matching#3276
Open
augustocbx wants to merge 1 commit into
Open
Conversation
The `/\((.*?)\)/` pattern used by `Faker::Base#regexify` to expand alternation groups (e.g. `(this|that)`) runs in polynomial time on uncontrolled locale input: because `.*?` can also match `(`, an input such as `(a(a(a...` forces super-linear backtracking. This is the remaining "Polynomial regular expression used on uncontrolled data" code-scanning alert. Restrict the captured content to non-parenthesis characters (`[^()]*`), which cannot backtrack across a delimiter and so matches in linear time. This mirrors the possessive-quantifier hardening applied to the sibling regexes in the same method in faker-ruby#3196, and matches `regexify`'s documented contract that it does not handle nested parentheses. Output is unchanged for all supported patterns. [Fix faker-ruby#3183]
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation / Background
This Pull Request fixes the remaining "Polynomial regular expression used on uncontrolled data" code-scanning alert. Fixes #3183.
In
Faker::Base#regexify(lib/faker.rb), the pattern that expands alternation groups such as(this|that)was:\(.*?\)runs in polynomial time on uncontrolled (locale) input: because the.*?content can itself match(, an input like(a(a(a…forces super-linear backtracking. #3196 hardened every sibling regex in this method (possessive quantifiers) but this one was left unchanged, so the alert remained.Solution
Restrict the captured group content to non-parenthesis characters:
[^()]*cannot backtrack across a(or)delimiter, so matching is linear. This is the standard remedy for this CodeQL pattern and mirrors the backtracking-elimination approach already taken in #3196 for the neighbouring regexes. It is also consistent withregexify's own documented contract — its header comment states it "does not handle … nested parentheses" — so behaviour is unchanged for every supported pattern.I confirmed the fix with the same CodeQL query GitHub runs (
rb/polynomial-redos, bundle 2.25.6): the alert on this regex is present before the change and gone after it, while the unrelated alerts elsewhere are unaffected.Behaviour is byte-for-byte identical on all supported inputs (alternations, empty
(), no-group strings, multiple/quantified groups); only nested parentheses — whichregexifyexplicitly does not support and no locale data uses — match differently.Additional information
sample/splitlogic is untouched, output under a fixed seed is unchanged.(foo|bar|baz)) was added totest_regexifyto guard the alternation path.bundle exec rakeis green: 2179 tests, 0 failures, 0 errors; RuboCop clean.Checklist
[Fix #3183]).bundle exec rake(tests + RuboCop) passes locally.