Project

General

Profile

Request Router » History » Version 5

« Previous - Version 5/6 (diff) - Next » - Current version
Elmer de Looff, 2012-05-17 12:12
Updated documentation to reflect decoding of PATH_INFO to Unicode where available, and router evaluating regular expressions in Unicode context


The Request Router

When a request is sent to µWeb, the first thing that needs to be made is the decision where and how to handle it. The request router is the place where this happens. The router employed in µWeb is based on regular expressions, and delegates the request to the handler associated with the first matching expression.

Before we explain everything there is know about routes and handlers, an example of how a router would look. The example given is a very stripped down version of the handler from the uweb_info example project:

import uweb
from uweb.uweb_info import pages

PAGE_CLASS = pages.PageMaker
ROUTES = (
    ('/static/(.*)', 'Static'),
    ('/(broken.*)', 'FourOhFour'),
    ('/haltandcatchfire', 'MakeFail'),
    ('/json', 'Json'),
    ('/text', 'Text'),
    ('/redirect/(.*)', 'Redirect'),
    ('/OpenIDLogin', '_OpenIdInitiate'),
    ('/OpenIDValidate', '_OpenIdValidate'),
    ('/ULF-Challenge', '_ULF_Challenge'),
    ('/ULF-Login', '_ULF_Verify'),
    ('/(.*)', 'Index'))

uweb.ServerSetup()
In the example above, the following things happen:
  • Firstly, we import the uweb package, since we need that to configure the webserver from
  • Secondly, we import the module where the class with our handling methods is defined (the uweb.uweb_info.pages module).
  • PAGE_CLASS is the global that holds a reference to the PageMaker subclass for your project, from where requests are handled.
  • ROUTES is the global that defines the various requests that the router understands, and directs them to the (named) method of the PAGE_CLASS.
  • Finally, uweb.ServerSetup() starts the webserver for those cases where it runs StandAlone (and does some initial setting up for when it runs using Apache)

When a request arrives, it is checked against all the defined routes. Each of the routes is a 2-tuple, with a regular expression and a method name (as string). The request URL is matched against the regular expression, and if it succeeds, the method name will be resolved on the PAGE_CLASS. This method will then be executed, and the router will not continue searching for a next match.

To illustrate the previous with an example request "/haltandcatchfire":

  1. The PAGE_CLASS is instantiated into a live PageMaker
  2. The first route tuple is inspected
    • Its regex is '/static/(.*)'
    • The request does not match the regex
  3. The second route tuple is inspected
    • Its regex is '/(broken.*)'
    • The request does not match the regex
  4. The third route tuple is inspected
    • Its regex is '/haltandcatchfire'
    • The request matches the regex
    • The associated handler method name is 'MakeFail'
    • The method MakeFail is resolved on our PageMaker instance, resulting in pages.PageMaker.MakeFail
    • This method is executed and its results will be sent to the client
  5. Router has ended after the third inspected route

Arguments from the request string

Oftentimes, there are parts of the request string that are needed further on in the process. While it's possible to extract these in the PageMaker methods, this is inconvenient, and the router has the means to do this. As can be seen in the router example above, some of the regular expressions have capture groups defined. These capture groups are provided to the PageMaker method as positional arguments. For example:

Requesting "/user/elmer/edit/27" on the following router:

PAGE_CLASS = blog.BlogPages
ROUTES = (
    ('/user/(\w*)/edit/(\d*)', 'EditArticle'),
    ('/', 'Index'))

Will end up calling the method blog.BlogPages.EditArticle with the arguments ('Elmer', '27'). Note that all arguments are provided as strings, type-conversion is left to the developer.

Typical regexes for request matching

How to use various other url formats:

  • To match one or more letters, numbers, dashes and spaces, use this: '([\w\- ]+)'
  • Match optional trailing slashes (so that requests don't end up 404'ing because of an added slash) add: '/?'
  • To match optional an optional page number: 'article/([\w\- ]+)/?(\d+)?'
    • This matches /article/cookies_are_delicious
    • As well as /article/cookies_are_delicious/2
    • And also /article/cookies_are_delicious/ – to allow for trailing slashes that are almost certain to happen

Unicode notice

Where available (and necessary), path strings will be decoded from UTF8 in the Request object. The regular expressions for the router are also evaluated with full support for Unicode. This means that a route-regex '\w+' will match 'café' (the French for 'coffee'), despite the non-ASCII characters in that string.