Encode disallowed characters around already percent-encoded triples by gaoflow · Pull Request #151 · python-hyper/uritemplate

gaoflow · 2026-06-29T08:14:13Z

Problem

Reserved ({+var}) and fragment ({#var}) expansion is supposed to preserve already valid percent-encoded triples while still percent-encoding any other disallowed characters (RFC 6570 Section 3.2.3).

Operator._only_quote_unquoted_characters does this with an all-or-nothing check: if urllib.parse.unquote(value) != value (i.e. any triple is present) it returns the value untouched, which also skips encoding every other disallowed character.

>>> from uritemplate import URITemplate
>>> URITemplate("{+v}").expand(v="a b%20c")
'a b%20c'        # space left raw; expected 'a%20b%20c'
>>> URITemplate("{#v}").expand(v="x y%20z")
'#x y%20z'       # expected '#x%20y%20z'

This is a regression introduced by the fix for #99, which addressed the opposite problem (valid triples being double-encoded).

Fix

Quote the value segment by segment: pass valid percent-encoded triples through verbatim and percent-encode the text between them. Output is identical for values with no triples or with only triples, so existing behavior is preserved — all current tests still pass, and the new case is now correct:

>>> URITemplate("{+v}").expand(v="a b%20c")
'a%20b%20c'
>>> URITemplate("{+v}").expand(v="a%2Fb c")   # uppercase triple kept, space encoded
'a%2Fb%20c'

Tests

Added test_reserved_expansion_encodes_around_pct_triples. It fails on the current code (a b%20c → a%20b%20c) and passes with the fix; the full suite (59 tests) is green. black, isort, and mypy are clean.

sigmavirus24 · 2026-07-01T02:16:00Z

 _GEN_DELIMS: t.Final[str] = ":/?#[]@"
 _SUB_DELIMS: t.Final[str] = "!$&'()*+,;="
 _RESERVED_CHARACTERS: t.Final[str] = f"{_GEN_DELIMS}{_SUB_DELIMS}"
+_PERCENT_ENCODED: t.Final["re.Pattern[str]"] = re.compile("%[0-9A-Fa-f]{2}")


Suggested change

_PERCENT_ENCODED: t.Final["re.Pattern[str]"] = re.compile("%[0-9A-Fa-f]{2}")

_PERCENT_ENCODED: t.Final[re.Pattern[str]] = re.compile("%[0-9A-Fa-f]{2}")

Furthermore, this does not check for whether or not the percent encoding is a valid percent encoding. This just checks if it looks like one. Text that doesn't have a valid percent encoding could still be confused here.

Applied the annotation suggestion, thanks.

On the validity point: you're right that the regex only matches things shaped like a pct-triple rather than verifying intent, but that look-alike pass-through is what RFC 6570 specifies for reserved ({+}) and fragment ({#}) expansion. §3.2.3 copies pct-encoded triples through unchanged, and the pct-encoded = "%" HEXDIG HEXDIG production is purely syntactic, so an already-encoded %41 is meant to be passed as-is. The library already did this for a bare "%41"; the bug here is that a following disallowed character caused the whole tail (triple included) to be re-encoded, which contradicts that same rule. So the change narrows behavior back to the RFC rather than adding new look-alike matching.

Strict validation (encode %XX that isn't a 'real' encoding) would be a deliberate divergence from reserved-expansion semantics and its own change - happy to open a separate issue if you'd want to explore that.

Reserved ("+") and fragment ("#") expansion must preserve already valid percent-encoded triples (RFC 6570 Section 3.2.3). The current implementation does this by returning the value untouched as soon as `urllib.parse.unquote` detects any triple, which also skips encoding every other disallowed character in the value. As a result `URITemplate("{+v}").expand(v="a b%20c")` produced "a b%20c" (space left raw) instead of "a%20b%20c". This is a regression from the fix for issue python-hyper#99, which addressed the opposite problem of double-encoding. Quote the value segment by segment instead: pass valid percent-encoded triples through verbatim and percent-encode everything between them. Output is unchanged for values with no triples or with only triples, so existing behavior (and all current tests) is preserved.

sigmavirus24 requested changes Jul 1, 2026

View reviewed changes

gaoflow force-pushed the fix-reserved-expansion-pct-triples branch from c37149c to 7ece9a1 Compare July 4, 2026 20:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Encode disallowed characters around already percent-encoded triples#151

Encode disallowed characters around already percent-encoded triples#151
gaoflow wants to merge 1 commit into
python-hyper:mainfrom
gaoflow:fix-reserved-expansion-pct-triples

gaoflow commented Jun 29, 2026

Uh oh!

sigmavirus24 Jul 1, 2026

Uh oh!

gaoflow Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	_PERCENT_ENCODED: t.Final["re.Pattern[str]"] = re.compile("%[0-9A-Fa-f]{2}")
	_PERCENT_ENCODED: t.Final[re.Pattern[str]] = re.compile("%[0-9A-Fa-f]{2}")

Uh oh!

Conversation

gaoflow commented Jun 29, 2026

Problem

Fix

Tests

Uh oh!

sigmavirus24 Jul 1, 2026

Choose a reason for hiding this comment

Uh oh!

gaoflow Jul 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants