Bug #819
Router does not support non-ASCII in paths
Description
The regular expressions in the router are evaluated without Unicode support. Additionally, even when they are, this does not work for URLs that have escaped UTF8 characters in them. This is because the UTF8 bytes are provided as-is to the regular expression matcher, instead of being decoded into Unicode.
Associated revisions
Revision 234:bb882bcfd5d2
(diff)
Added page for Non-word/Unicode/Mojibake catchall with some explanation. This refs #819.
History
#1 Updated by Elmer de Looff over 12 years ago
- Subject changed from Router does not handle Unicode URLs to Router does not support non-ASCII in paths
- Description updated (diff)
- Target version set to µWeb alpha release
- Estimated time set to 1.00 h
#2 Updated by Elmer de Looff over 12 years ago
The result of this lack of support is that none of the expected routes match, and that either a NoRouteError
is raised when characters with ordinal > 127 are found, or that a catch-all route is picked where this is not appropriate.
#3 Updated by Elmer de Looff over 12 years ago
- Category set to Core
#4 Updated by Elmer de Looff over 12 years ago
- Status changed from New to Resolved
- % Done changed from 0 to 70
Applied in changeset 12361fbb3095.
#5 Updated by Elmer de Looff over 12 years ago
- Status changed from Resolved to Closed
- % Done changed from 70 to 100
Code has been fixed, documentation has also been updated, and an example has been added to the uweb_info
project.
Updated request.Request to decode the path_info from UTF8 where available, and router regex matching is now done with Unicode support. This enables routers to work on the full Unicode range, and accented characters to be matched as letters. This resolves #819.