Skip to content

Encode disallowed characters around already percent-encoded triples#151

Open
gaoflow wants to merge 1 commit into
python-hyper:mainfrom
gaoflow:fix-reserved-expansion-pct-triples
Open

Encode disallowed characters around already percent-encoded triples#151
gaoflow wants to merge 1 commit into
python-hyper:mainfrom
gaoflow:fix-reserved-expansion-pct-triples

Conversation

@gaoflow

@gaoflow gaoflow commented Jun 29, 2026

Copy link
Copy Markdown

Problem

Reserved ({+var}) and fragment ({#var}) expansion is supposed to preserve already valid percent-encoded triples while still percent-encoding any other disallowed characters (RFC 6570 Section 3.2.3).

Operator._only_quote_unquoted_characters does this with an all-or-nothing check: if urllib.parse.unquote(value) != value (i.e. any triple is present) it returns the value untouched, which also skips encoding every other disallowed character.

>>> from uritemplate import URITemplate
>>> URITemplate("{+v}").expand(v="a b%20c")
'a b%20c'        # space left raw; expected 'a%20b%20c'
>>> URITemplate("{#v}").expand(v="x y%20z")
'#x y%20z'       # expected '#x%20y%20z'

This is a regression introduced by the fix for #99, which addressed the opposite problem (valid triples being double-encoded).

Fix

Quote the value segment by segment: pass valid percent-encoded triples through verbatim and percent-encode the text between them. Output is identical for values with no triples or with only triples, so existing behavior is preserved — all current tests still pass, and the new case is now correct:

>>> URITemplate("{+v}").expand(v="a b%20c")
'a%20b%20c'
>>> URITemplate("{+v}").expand(v="a%2Fb c")   # uppercase triple kept, space encoded
'a%2Fb%20c'

Tests

Added test_reserved_expansion_encodes_around_pct_triples. It fails on the current code (a b%20ca%20b%20c) and passes with the fix; the full suite (59 tests) is green. black, isort, and mypy are clean.

Comment thread uritemplate/variable.py Outdated
_GEN_DELIMS: t.Final[str] = ":/?#[]@"
_SUB_DELIMS: t.Final[str] = "!$&'()*+,;="
_RESERVED_CHARACTERS: t.Final[str] = f"{_GEN_DELIMS}{_SUB_DELIMS}"
_PERCENT_ENCODED: t.Final["re.Pattern[str]"] = re.compile("%[0-9A-Fa-f]{2}")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_PERCENT_ENCODED: t.Final["re.Pattern[str]"] = re.compile("%[0-9A-Fa-f]{2}")
_PERCENT_ENCODED: t.Final[re.Pattern[str]] = re.compile("%[0-9A-Fa-f]{2}")

Furthermore, this does not check for whether or not the percent encoding is a valid percent encoding. This just checks if it looks like one. Text that doesn't have a valid percent encoding could still be confused here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied the annotation suggestion, thanks.

On the validity point: you're right that the regex only matches things shaped like a pct-triple rather than verifying intent, but that look-alike pass-through is what RFC 6570 specifies for reserved ({+}) and fragment ({#}) expansion. §3.2.3 copies pct-encoded triples through unchanged, and the pct-encoded = "%" HEXDIG HEXDIG production is purely syntactic, so an already-encoded %41 is meant to be passed as-is. The library already did this for a bare "%41"; the bug here is that a following disallowed character caused the whole tail (triple included) to be re-encoded, which contradicts that same rule. So the change narrows behavior back to the RFC rather than adding new look-alike matching.

Strict validation (encode %XX that isn't a 'real' encoding) would be a deliberate divergence from reserved-expansion semantics and its own change - happy to open a separate issue if you'd want to explore that.

Reserved ("+") and fragment ("#") expansion must preserve already valid
percent-encoded triples (RFC 6570 Section 3.2.3). The current implementation
does this by returning the value untouched as soon as `urllib.parse.unquote`
detects any triple, which also skips encoding every other disallowed
character in the value.

As a result `URITemplate("{+v}").expand(v="a b%20c")` produced "a b%20c"
(space left raw) instead of "a%20b%20c". This is a regression from the fix
for issue python-hyper#99, which addressed the opposite problem of double-encoding.

Quote the value segment by segment instead: pass valid percent-encoded
triples through verbatim and percent-encode everything between them. Output
is unchanged for values with no triples or with only triples, so existing
behavior (and all current tests) is preserved.
@gaoflow gaoflow force-pushed the fix-reserved-expansion-pct-triples branch from c37149c to 7ece9a1 Compare July 4, 2026 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants