Project

General

Profile

TemplateParser

The µWeb TemplateParser is a in-house developed templating engine that provides tag replacement, tag-functions and template control functions. This document will describe the following:

First though, to help with understanding the TemplateParser, a minimal size template document:

Hello [title] [name]

The above document contains two simple template tags. These tags are delimited by square brackets, and they will be replaced by the named argument provided during parsing. If this name is not present, then the literal presentation of the tag will remain in the output.

Using TemplateParser inside µWeb

Within the default µWeb PageMaker, there is a parser property, which provides a Parser object. The class constant TEMPLATE_DIR provides the template search directory. The default template directory is 'templates'.

N.B.: This path is relative to the file that contains the PageMaker class.

An example of TemplateParser to create a complete response:

import uweb
import time

class PageMaker(uweb.PageMaker):
  def VersionPage(self):
    return self.parser.Parse(
      'version.utp', year=time.strftime('%Y'), version=uweb.__version__)

The example template for the above file could look something like this:

<!DOCTYPE html>
<html>
  <head>
    <title>µWeb version info</title>
  </head>
  <body>
    <p>µWeb version [version] - Copyright 2010-[year] Underdark</p>
  </body>
</html>

And would result in the following output:

<!DOCTYPE html>
<html>
  <head>
    <title>µWeb version info</title>
  </head>
  <body>
    <p>µWeb version 0.12 - Copyright 2010-2012 Underdark</p>
  </body>
</html>

With these initial small demonstrations behind us, let's explore the TemplateParser further

Template class

The Template class provides the interface for pre-parsing templates, loading them from files and parsing single templates to completion. During pre-parsing, constructs such as loops and conditional statements are converted to TemplateLoop and TemplateConditional objects, and their scopes nested appropriately in the Template. Tags are replaced by TemplateTag instances, and text is captured in TemplateText. All of these provide Parse methods, which together result in the combined parsed template output.

Creating a template

A template is created simple by providing a string input to the Template's constructor. This will return a valid Template instance (or raise an error if there is a problem with the syntax:

>>> import templateparser
>>> template = templateparser.Template('Hello [title] [name]')
>>> template
Template([TemplateText('Hello '), TemplateTag('[title]'), TemplateText(' '), TemplateTag('[name]')])

Above can be seen the various parts of the template, which will be combined to output once parsed.

Loading a template from file

The Template class provides a classmethod called FromFile, which loads the template at the path.

Loading a template named example.utp from the current working directory:

>>> import templateparser
>>> template = templateparser.Template.FromFile('example.utp')
>>> template
Template([TemplateText('Hello '), TemplateTag('[title]'), TemplateText(' '), TemplateTag('[name]')])

Parsing a template

Parsing a template can be done by calling the Template's Parse method. The keyword arguments provided to this call will from the replacement mapping for the template. In the following example, we will provide one such keyword, and leave the other undefined to show the (basic) behavior of the Template.Parse method.

>>> import templateparser
>>> template = templateparser.Template('Hello [title] [name]')
>>> template.Parse(title='sir')
'Hello sir [name]'

Parser class

The Parser class provides simple management of multiple Template objects. It is mainly used to load templates from disk. When initiating a Parser, the first argument provides the search path from where templates should be loaded (the default is the current working directory). An optional second argument can be provided to preload the template cache: a mapping of names and Template objects.

Loading templates

Creating a parser object, and loading the 'example.utp' template from the 'templates' directory works like this:

>>> import templateparser
>>> # This sets the 'templates' directory as the search path for AddTemplate
>>> parser = templateparser.Parser('templates')
>>> # Loads the 'templates/example.utp' and stores it as 'example.utp':
>>> parser.AddTemplate('example.utp')
>>> parser.Parse('example.utp', title='mister', name='Bob Dobalina')
'Hello mister Bob Dobalina'

The AddTemplate method takes a second optional argument, which allows us to give the template a different name in the cache:

>>> parser = templateparser.Parser('templates')
>>> parser.AddTemplate('example.utp', name='greeting')
>>> parser.Parse('greeting', title='mister', name='Bob Dobalina')
'Hello mister Bob Dobalina'

As you can see, the name of the template in the cache is not necessarily the same as the one on disk. Often though, this is not necessary to change, so AddTemplate need only be called with one argument. Or not at all, as the following section will show.

Template cache, reloading, and auto-loading

The Parser at heart is a dictionary that maps the names of templates to Template instances. When they are loaded from disk they are pre-parsed, checked and cached. Subsequent uses of the same template will therefore be faster, as the initial parsing will not have to be repeated.

Templates loaded from a file keep track of the modification time (mtime) of the originating template file. Upon each parse, the source file is checked, and if the modification time is newer than when it was loaded, the template is read from disk and then parsed. This way, templates are never out of date.

Whenever the Parser is requested to Parse or return a template that it doesn't have loaded already, the auto-loading mechanism triggers. This searches for the given template name in the configured template directory. If a filename matches (exactly), it is automatically loaded and used to fulfill the Parse request. If no matching file is found, an error is triggered.

Below follows an example of auto-loading:

>>> import templateparser
>>> parser = templateparser.Parser('templates')
>>> 'example.utp' in parser
False       # Since we haven't loaded it, the template is not in the parser storage
>>> parser
Parser({})  # The parser is empty (has no cached templates)

Attempting to parse a template that doesn't exist in the parser cache triggers an automatic load:

>>> parser.Parse('example.utp', title='mister', name='Bob Dobalina')
'Hello mister Bob Dobalina'
>>> 'example.utp' in parser
True
>>> parser
Parser({'example.utp': Template([TemplateText('Hello '), TemplateTag('[title]'),
                                 TemplateText(' '), TemplateTag('[name]')])})

If these cannot be found, TemplateReadError is raised:

>>> import templateparser
>>> parser = templateparser.Parser('templates')
>>> parser.Parse('bad_template.utp', failure='imminent')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/var/lib/underdark/libs/uweb/templateparser.py", line 147, in __getitem__
    self.AddTemplate(template)
  File "/var/lib/underdark/libs/uweb/templateparser.py", line 171, in AddTemplate
    raise TemplateReadError('Could not load template %r' % template_path)
underdark.libs.uweb.templateparser.TemplateReadError: Could not load template 'templates/bad_template.utp'

Parse and ParseString methods

For convencience and consistency, the Parser comes with two handy methods to provide parsing of Template objects, one from its cache, one from raw template strings. It is recommended to use these over the previously shown direct key-based access:

>>> import templateparser
>>> parser = templateparser.Parser('templates')
>>> parser.Parse('example.utp', title='mister', name='Bob Dobalina')
'Hello mister Bob Dobalina'
>>> parser.ParseString('Hello [title] [name]', title='mister', name='Bob Dobalina')
'Hello mister Bob Dobalina'

Templating language syntax

The templating syntax is relatively limited, but with the limited syntax it provides a flexible and rich system to create templates. Covered in these examples are:
  • Simple tags (used in various examples above)
  • Tag indexing
  • Tag functions
  • Template language constructs
All examples will consist of three parts:
  1. The example template
  2. The python invocation string (the template will be named 'example.utp')
  3. The resulting output (as source, not as parsed HTML)

Simple tags

This is an example for the most basic form of template tags. The tag is enclosed by square brackets as such: [tag]. Tags that match a provided argument to the Parse call get replaced. If there is no argument that matches the tag name, it is returned in the output verbatim. This is also demonstrated in the below example

The example below is a repeat of the example how to use TemplateParser inside µWeb, and shows the template result:

<!DOCTYPE html>
<html>
  <head>
    <title>µWeb version info</title>
  </head>
  <body>
    <p>µWeb version [version] - Copyright 2010-[year] Underdark</p>
    <p>
      This [paragraph] is not replaced because there is no
      paragraph argument provided to the parser.
    </p>
  </body>
</html>
>>> parser.Parse('version.utp', year=time.strftime('%Y'), version=uweb.__version__)
<!DOCTYPE html>
<html>
  <head>
    <title>µWeb version info</title>
  </head>
  <body>
    <p>µWeb version 0.11 - Copyright 2010-212 Underdark</p>
    <p>
      This [paragraph] is not replaced because there is no
      paragraph argument provided to the parser.
    </p>
  </body>
</html>

Valid tag name characters

Tag names are created from the same characters as valid Python variable names. This means they can contain upper and lower case letters, numbers and underscores. In regex terms, a tag should match \w+.

N.B.: Some names are illegal in Python as variable names but valid as tag names (tag names may start with a number). You can use these and pass the replacements as a dictionary using ** if you have a need for it.

Tag indexing

In addition to simple (re)placement of strings using the TemplateParser, you can also provide it with a list, dictionary, or other indexable object, and from it, fetch various indices, keys or attributes. The separation character between the tagname and the index is the colon (":"):

List/tuple index addressing

This works for lists and tuples, but also for any other object that supports indexing. That is, every object that accepts integers on its __getitem__ method.

This is [var:0] [var:1].
>>> parser.Parse('message.utp', var=('delicious', 'spam'))
This is delicious spam.

Dictionary key addressing

This works for dictionaries, but also for any other object that behaves like a key-value mapping. That is, every object that accepts strings on its __getitem__ method.

This is [var:adjective] [var:noun].
>>> parser.Parse('message.utp', var={'adjective': 'delicious', 'noun': 'spam'})
This is delicious spam.

Attribute name addressing

This works for any object that has named attributes. If the attribute is a method, it will not be executed automatically, the return value will simply be the (un)bound method itself.

This is [var:adjective] [var:noun].
>>> class Struct(object):
...   pass
...
>>> var = Struct()
>>> var.adjective = 'delicious'
>>> var.noun = 'spam'
>>> parser.Parse('message.utp', var=var)
This is delicious spam.

Lookup order

For objects and constructs that provide multiple ways of looking up information, the lookup order can be very important. For any of the first three steps, if they are successful, the retrieved value is returned, and no further attempts are made:

  1. If the needle is parseable as integer, it will first be used as an index. This will also work for mappings with numeric keys;
  2. If the above fails, the needle is assumed to be a string-like mapping key, and this is attempted
  3. If the above fails, the needle is used as an attribute name;
  4. If all of the above fail, TemplateKeyError is raised, as the needle could not be found on the object.

Nested indexes

There may be cases where the value you need is not at the top-level index of an object. This is not a problem, since TemplateParser supports arbitrary-depth nested structures in its index-lookup:

This is a variable from [some:levels:down:1].
>>> class Struct(object):
...   pass
...
>>> var = Struct()
>>> var.levels = {'down': ('the sky', 'the depths')}
>>> parser.Parse('message.utp', some=var)
This is a variable from the depths.

Valid index characters

Indexes may be constructed from upper and lower case letters, numbers, underscores and dashes. There are no restrictions on first character, only a minimum length of one. Regex-wise, they need to match [\w-]+

Checking for presence

The templateparser will raise an error when it stumbles upon an indexError or keyError when resolving requested tags. To avoid this the user can check wether or not all variables exists by using the ifpresent tag.

{{ ifpresent [uweb:version]}}
  <p>µWeb version [version]</p>
{ {else }}
  <p>µWeb unkown version</p>
{{ endif }}

Tag functions

Once you arrive at the tag/value you want, there's often some things that need to happen before the resulting template is sent to the requesting client (browser). HTML escaping is an obvious one, but url quoting of single arguments may also be helpful, as well as uppercasing, printing the length of a list (instead of the raw list) and various other uses.

Default html escaping

Using a tag function is a fairly straightforward process, just add the name of the function after the tagname, separated by a pipe ( | ):

And he said: [message|html]
>>> parser.Parse('message.utp', message='"Hello"')
And he said: &quot;Hello&quot;

Using the html tag function makes the tag value safe for printing in an HTML document. Because we believe this is really important, the html escaping tag function is always applied when no other tag function is applied:

And he said: [message]
>>> parser.Parse('message.utp', message='"Hello"')
And he said: &quot;Hello&quot;

Only when you use another tag function, or specifically tell TemplateParser to push the raw tag value into the output, are the quotes allowed through unchanged:

And he said: [message|raw]
>>> parser.Parse('message.utp', message='"Hello"')
And he said: "Hello" 

Predefined tag functions

  • html – This tag function escapes content to be safe for inclusion in HTML pages. This means that the ampersand ( & ), single and double quotes ( '  and  " ) and the pointy brackets ( <  and  > ) are converted to their respective character entity references
  • default – This is the tag function that will be executed when no other tag functions have been specified for a tag. By default, this will do the same as the html tag function. This can be adjusted by assigning another tag function to this name.
  • raw – This tag function passes the tag through without change. This is the function to use when you have no tag function to apply, but do not want the tag to be HTML-escaped.
  • url – This tag function prepares the tag for use in URLs. Space are converted to plus-signs ( + ), and other characters that are considered unsafe for URLs are converted to percent-notation.
  • values – This tag function can be used in conjunction with TemplateLoops to provide the values instead of the keys when iterating over dictionaries.

Adding custom functions

Custom methods can be added to a Parser object using the method RegisterFunction. This takes a name, and a single-argument function. When this function is encountered in a tag, it will be given the current tag value, and its result will be output to the template, or passed into the next function:

>>> from uweb import templateparser
>>> parser = templateparser.Parser()
>>> parser.RegisterFunction('len', len)
>>> template = 'The number of people in this group: [people|len].'
>>> parser.ParseString(template, elements=['Eric', 'Michael', 'John', 'Terry'])
'The number of people in this group: 4.'

N.B.: Using custom functions (or in fact any function other than html or no function) will suppress HTML escaping. If your content is still user-driven, or not otherwise made safe for output, it is strongly recommended you apply html escaping. This can be achieved by chaining functions, as explained below.

Function chaining

Multiple function calls can be chained after one another. The functions are processed left to right, and the result of each function is passed into the next, without any intermediate editing or changes:

Setting up the parser and registering our tag function:

>>> from uweb import templateparser
>>> parser = templateparser.Parser()
>>> parser.RegisterFunction('first', lambda x: x[0])

Working just one tag function returns the first element from the list:

>>> template = 'The first element of list: [elements|first].'
>>> parser.ParseString(template, elements=['Eric', 'Michael', 'John', 'Terry'])
'The first element of list: Eric.'

Repeating the function on the string returns the first character from that string:

>>> template = 'The first element of the first element of list: [elements|first|first].'
>>> parser.ParseString(template, elements=['Eric', 'Michael', 'John', 'Terry'])
'The first element of the first element of list: E.'

Valid function name characters

Tag function names may be constructed from upper and lower case letters, numbers, underscores and dashes. There are no restrictions on first character, only a minimum length of one. Regex-wise, they need to match [\w-]+

Functions with arguments:

You can support functions with arguments by creating them as such:


class PageMaker(uweb.PageMaker):

  @staticmethod
  def Limit(length=80):
    """Returns a closure that limits input to a number of chars/elements.""" 
    return lambda string: string[:length]

  @staticmethod
  def LimitString(length=80, endchar='...'):
    """Limits input to `length` chars and appends `endchar` if it was longer.""" 
    def _Limit(string, length=length, endchar=endchar):
      if len(string) > length:
        return string[:length] + endchar
      return string
    return _Limit

  def __init__(self, *args, **kwds):
    """Overwrites the default init to add extra templateparser functions.""" 
    super(PageMaker, self).__init__(*args, **kwds)
    self.parser.RegisterFunction('limit', self.Limit)
    self.parser.RegisterFunction('strlimit', self.LimitString)

The syntax to be used in the templates is as follows:

[input|limit()] 
[input|limit(20)] 
[input|strlimit(20)] 
[input|strlimit(30, "&ndash;")] 

These functions can still be chained as you would expect.

Limitations

Tag functions cannot be used inside loops or conditionals.

Tag closures

Closures in TemplateParser could be succinctly described as 'functions with arguments'. In the tag syntax, they are very similar to functions, and they can be freely mixed with functions.

An example tag closure looks like this: [text|maxlen(20)]. Here, the template tag text is limited to 20 characters in length. The same functionality could be achieved with a plain function maxlen20, but that would require a separate maxlen function for each length you want to limit the input to.

THe maxlen closure is achieved by registering a function 'maxlen' which takes a single argument (the maximum length) and returns a function (a closure) which performs this action. The template tag is passed to that closure, and the return value is used as the tag value. This principe is explained in more detail below:

Setting up the parser and registering our maxlen function:

>>> from uweb import templateparser
>>> def MaxLength(length):
...   def _MaxLen(tag_value, length=int(length)):
...     return tag_value[:length]
...   return _MaxLen
...
>>> parser = templateparser.Parser()
>>> parser.RegisterFunction('maxlen', MaxLength)

Applying the tag closure on a simple template:

>>> template = 'Small blurb: "[text|maxlen(20)]".'
>>> parser.ParseString(template, text="Python is a general-purpose, high-level programming language.")
'Small blurb: "Python is a general-".'

Arguments

Closures accept multiple arguments. The example above could be extended to contain a 'suffix' that should be appended to truncated strings. In that case there would be a length check, and dependent on the outcome, the string would be truncated and a suffix placed. Or a different function where only the beginning and end would show, and the middle truncated.

Valid closure name/argument characters

Tag closure names have the same restrictions as tag function names (alphanumeric, underscores, dashes; regex [\w-]+). Closure arguments should form a legal Python tuple, and may contain any characters except for parentheses. A closure may have zero arguments (they are defined by a pair of parentheses). The regex for this is simply: [^()]*.

Limitations

Currently, closure arguments can only be positional. That is, there is no support for keyword arguments in the tag closure. This limits the possibilities somewhat, but

TemplateLoop

As a language construct, TemplateParser has an understanding of iteration. The TemplateLoop can be compared to the Python for-loop, or the foreach construct in other languages (lazy iteration over the values of an iterable).

Syntax and properties

Syntax: {{ for local_var in [collection] }}
  • The double accolades (curly braces) indicate the beginning and end of the construct;
  • The for keyword indicates the structure to execute;
  • local_var is the name which references the loop variable;
  • [collection] is the tag that provides the iteratable.
Properties
  • The local name is stated without brackets (as it's no tag itself)
  • When it needs to be placed in the output, the local name should have brackets (like any other tag)
  • N.B. The local variable does not bleed into the outer scope after the loop has completed.
    It is therefore possible (though not recommended) to name the loop variable after the iterable: {{ for collection in [collection] }}.

Example of a TemplateLoop

<html>
  <body>
    <ul>
    {{ for name in [presidents] }}
      <li>President [name]</li>
    {{ endfor }}
    </ul>
  </body>
</html>
>>> parser.Parse('rushmore.utp', presidents=['Washington', 'Jefferson', 'Roosevelt', 'Lincoln'])
<html>
  <body>
    <ul>
      <li>President Washington</li>
      <li>President Jefferson</li>
      <li>President Roosevelt</li>
      <li>President Lincoln</li>
    </ul>
  </body>
</html>

Inlining templates

Often, there will be snippets of a template that will see a lot of reuse. Page headers and footers are often the same on many pages, and having several redundant copies means that changes will have to be replicated to each of these occurrances. To reduce the need for this, TemplateParser has an inline statement. Using this you can specify a template that is available in the [[TemplateParser#Parser]] instance and the statement will be replaced by the template.

Of course, if the inlined template is not already in the Parser instance, the autoloading mechanism will trigger, and the named template will be search for in the Parser's template directory.

First, we will define our inline template, 'inline_hello.utp':

<p>Hello [name]</p>

Secondly, our main template, 'hello.utp':

<h1>Greetings</h1>
{{ inline inline_hello.utp }}

Then we parse the template:

>>> parser.Parse('hello.utp', name='Dr John')
<h1>Greetings</h1>
<p>Hello Dr John</p>

Conditional statements

Often, you'll want the output of your template to be dependent on the value, presence, or boolean value of another tag. For instance, we may want a print a list of attendees to a party. We start the if conditional by checking the boolean value of the attendees tag. If this list if not-empty, we will print the attendee names, but if it's empty (or contains only a single entry), we'll tell the user in more intelligent ways than giving them a list with zero entries:

<h1>Party attendees</h1>
{{ if len([attendees]) > 1 }}
  <ol>
    {{ for attendee in [attendees] }}
    <li>[attendee:name]</li>
    {{ endfor }}
  </ol>
{{ elif [attendees] }}
  <p>only [attendees:0:name] is attending.</p>
{{ else }}
  <p>There are no registered attendees yet.</p>
{{ endif }}

For the case where there are several attendees:

>>> parser.Parse('party.utp', attendees=[
...    {'name': 'Livingstone'},
...    {'name': 'Cook'},
...    {'name': 'Drake'}])
<h1>Party attendees</h1>
<ol>
  <li>Livingstone</li>
  <li>Cook</li>
  <li>Drake</li>
</ol>

For the case where there is one attendee:

>>> parser.Parse('party.utp', attendees=[{'name': 'Johnny'}])
<h1>Party attendees</h1>
<p>Only Johnny is attending.</p>

And in the case where there are no attendees:

>>> parser.Parse('party.utp', attendees=[])
<h1>Party attendees</h1>
<p>There are no registered attendees yet.</p>

Properties of conditional statements

  • All template keys must be referenced as proper tag
    This is to prevent mixing of the template variables with the functions and reserved names of Python itself. Conditional expressions are evaluated using eval(), and proper tags are replaced by temporary names, the values of which are stored in a retrieve-on-demand dictionary. This makes them perfectly safe with regard to the value of template replacements, but some care should be taken with the writing of the conditional expressions.
  • It is possible to index tags in conditional statements
    This allows for decisions based on the values in those indexes/keys. For instance, Person objects can be checked for gender, so that the correct gender-based icon can be displayed next to them.
  • Referencing a tag or index that doesn't exist raises @TemplateNameError
    Unlike in regular template text, there is no suitable fallback value for a tag or index that cannot be retrieved. However, in most cases this can be prevented by making use of the following property:
  • Statement evaluation is lazy
    Template conditions are processed left to right, and short-circuited where possible. If the first member of an or group succeeds, the return value is already known. Similarly, if the first member of an and group fails, the second part need not be evaluated. This way TemplateNameErrors can often be prevented, as in most cases, presence of indexes can be confirmed before accessing.

Template unicode handling

Any unicode object found while parsing, will automatically be encoded to UTF-8:

>>> template = 'Underdark [love] [app]'
>>> output = parser.ParseString(template, love=u'\u2665', app=u'\N{micro sign}Web')
>>> output
'Underdark \xe2\x99\xa5 \xc2\xb5Web'  # The output in its raw UTF-8 representation
>>> output.decode('UTF8')
u'Underdark \u2665 \xb5Web'           # The output converted to a Unicode object
>>> print output
Underdark  µWeb                      # And the printed UTF-8 as we desired it.