Skip to content

Fix polynomial ReDoS in regexify alternation matching#3276

Open
augustocbx wants to merge 1 commit into
faker-ruby:mainfrom
augustocbx:fix-regexify-polynomial-redos
Open

Fix polynomial ReDoS in regexify alternation matching#3276
augustocbx wants to merge 1 commit into
faker-ruby:mainfrom
augustocbx:fix-regexify-polynomial-redos

Conversation

@augustocbx

Copy link
Copy Markdown

Motivation / Background

This Pull Request fixes the remaining "Polynomial regular expression used on uncontrolled data" code-scanning alert. Fixes #3183.

In Faker::Base#regexify (lib/faker.rb), the pattern that expands alternation groups such as (this|that) was:

.gsub(/\((.*?)\)/) { |match| sample(match.gsub(/[()]/, '').split('|')) }

\(.*?\) runs in polynomial time on uncontrolled (locale) input: because the .*? content can itself match (, an input like (a(a(a… forces super-linear backtracking. #3196 hardened every sibling regex in this method (possessive quantifiers) but this one was left unchanged, so the alert remained.

Solution

Restrict the captured group content to non-parenthesis characters:

.gsub(/\(([^()]*)\)/) { |match| sample(match.gsub(/[()]/, '').split('|')) }

[^()]* cannot backtrack across a ( or ) delimiter, so matching is linear. This is the standard remedy for this CodeQL pattern and mirrors the backtracking-elimination approach already taken in #3196 for the neighbouring regexes. It is also consistent with regexify's own documented contract — its header comment states it "does not handle … nested parentheses" — so behaviour is unchanged for every supported pattern.

I confirmed the fix with the same CodeQL query GitHub runs (rb/polynomial-redos, bundle 2.25.6): the alert on this regex is present before the change and gone after it, while the unrelated alerts elsewhere are unaffected.

Behaviour is byte-for-byte identical on all supported inputs (alternations, empty (), no-group strings, multiple/quantified groups); only nested parentheses — which regexify explicitly does not support and no locale data uses — match differently.

Additional information

  • Determinism preserved: sample / split logic is untouched, output under a fixed seed is unchanged.
  • Backward compatibility: no observable change for supported patterns; no public API change.
  • A regression case ((foo|bar|baz)) was added to test_regexify to guard the alternation path.
  • bundle exec rake is green: 2179 tests, 0 failures, 0 errors; RuboCop clean.

Checklist

  • This Pull Request is related to one change only.
  • Commit message describes what changed and why, and references the issue ([Fix #3183]).
  • Tests are added or updated.
  • bundle exec rake (tests + RuboCop) passes locally.
  • No new generator/locale/feature; no CHANGELOG edits; determinism preserved.

The `/\((.*?)\)/` pattern used by `Faker::Base#regexify` to expand
alternation groups (e.g. `(this|that)`) runs in polynomial time on
uncontrolled locale input: because `.*?` can also match `(`, an input
such as `(a(a(a...` forces super-linear backtracking. This is the
remaining "Polynomial regular expression used on uncontrolled data"
code-scanning alert.

Restrict the captured content to non-parenthesis characters
(`[^()]*`), which cannot backtrack across a delimiter and so matches
in linear time. This mirrors the possessive-quantifier hardening
applied to the sibling regexes in the same method in faker-ruby#3196, and
matches `regexify`'s documented contract that it does not handle
nested parentheses. Output is unchanged for all supported patterns.

[Fix faker-ruby#3183]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix 'Polynomial regular expression used on uncontrolled data code' scanning alert

1 participant