Implement "Followup improvements for ext/uri" RFC - WHATWG URL building#22268
Implement "Followup improvements for ext/uri" RFC - WHATWG URL building#22268kocsismate wants to merge 2 commits into
Conversation
97cb0bf to
0655d8c
Compare
| goto failure; | ||
| } | ||
|
|
||
| if (lexbor_base_url != NULL) { |
There was a problem hiding this comment.
What's a big shame is that apparently it's not possible to properly use the builder with a base URL :(
- if we try to add the base URL after the input URL is built, then legitimate relative URLs are rejected (e.g.
/foo+https://example.com), because/foois not a valid URL on its own - If we try to build the input URL with the base URL in the same time, then the parsing algorithm must be used (currently, only setters are used with a hack on line 821). Then the complication is to find some URL component that is suitable for parsing:
- for special, full URLs: scheme + host is needed at least (e.g. https://example.com)
- for non-special full URLs: scheme is needed at least (e.g. https://)
- for relative URLs: the path is needed at least (e.g.
/foo/bar)
And in the 2nd case, the question arises if it's ok to parse only the minimally required components and then set the rest of the components, or the whole input URL must be built and parsed all at once.
0655d8c to
af73d4a
Compare
|
May I have a review soon so that this can potentially be included into alpha 2 at least? :) |
| return c >= '0' && c <= '9'; | ||
| } | ||
|
|
||
| ZEND_ATTRIBUTE_NONNULL zend_result php_uri_parser_whatwg_validate_scheme(const zend_string *scheme) |
There was a problem hiding this comment.
it's possible to get rid of this validation if we don't want to reimplement the related WHATWG URL checks. The rest of the components surely cannot be validated early during the setter calls because they are mostly context-sensitive.
| * The URL is initialized as LXB_URL_SCHEMEL_TYPE__UNDEF but this would prevent the scheme to be updated | ||
| * in case of non-special schemes due to https://github.com/php/php-src/blob/27d7b799c0a13578ee0506b428b8ddc209ffb010/ext/lexbor/lexbor/url/url.c#L1402 | ||
| */ | ||
| if (!php_uri_parser_whatwg_is_special_scheme(scheme)) { |
There was a problem hiding this comment.
I'll try to check if it's something that we can omit somehow...
|
|
||
| const zend_string *str = Z_STR_P(scheme); | ||
|
|
||
| switch (ZSTR_LEN(str)) { |
There was a problem hiding this comment.
zend_string_equals_literal_ci already has a length check built-in. The compiler can factor this out for the common cases. Therefore, you can skip checking this length explicitly, and just do one giant return zend_string_equals_literal_ci(str, "ws") || zend_string_equals_literal_ci(str, "ftp") || ...;
The compiler is smart enough to optimise this, and then the code becomes (imo) a bit cleaner
There was a problem hiding this comment.
The original logic remained, but I could no longer use zend_string_equals_literal_ci() because newline and tab characters had to be stripped from the scheme per the spec.
TimWolla
left a comment
There was a problem hiding this comment.
Had a very superficial first look.
| return true; | ||
| } | ||
|
|
||
| ZEND_ATTRIBUTE_NONNULL zend_result php_uri_parser_whatwg_validate_port(const zend_long port) |
There was a problem hiding this comment.
The port is disregarded by the port setter if the URL cannot have a port (e.g. there is no scheme, see the "ext/uri/tests/whatwg/builder/port_success_empty_opaque_host.phpt" test at https://github.com/php/php-src/pull/22268/changes#diff-38d974ee6050457f68d4650d100c80b641befeb781332f7a05822ec1d2d7b87d), so it's a question whether this validation can be implemented. :(
Related WHATWG behavior:
A URL cannot have a username/password/port if its host is null or the empty string, or its scheme is "file".
However, the above concept is only part of some setters, setting a username/password/port for missing/empty hosts is rejected during parsing. So I guess the Builder should adhere to parsing rules, and therefore invalid ports, as well as setting username/password/port should be rejected via our custom checks (similar to the ones that the RFC3986 Builder used).
RFC: https://wiki.php.net/rfc/uri_followup#uri_building