Monday, August 22, 2016

My own templates implementation

What do you do when you need HTML templates but you don't want to include any heavy library or framework? You experiment and write your own implementation... Welcome to a journey within DOM, regular expressions and function builders.

It's been a long time...

OK, I have to admit that the blog is not really active... I should probably write more often. It's not because of procrastination but rather a time management issue. Furthermore, writing takes me a lot of time as I always review the article several times before publishing. Still, I may have missed typos and other mistakes, so don't hesitate to give feedback...

A backlog of articles I would like to write is maintained and I also created a task on my Habitica as a reminder to fill the blog.

So you may wonder what do I spend my time on.

A lot of personal - good and bad - events, a new framework to learn and a deep code refactoring to improve maintainability are taking most of it.

But in between everything I still have some rare occasions of fun and I started this micro project recently.

The need

In the GPF-JS library, the source and test files are organized with the help of one special configuration file: sources.json. It is the vertebral column of the project as all the tools related to building, testing or even documenting are based on it.

Documentation generation relies on JSDoc and a grunt plugin but the code base needs some additional cleanup. Consequently only few files are currently considered.

This JSON storage lists files and associates properties to them:

  • Textual description of the source content
  • Flag to know if it has a test counterpart
  • Optional flag to allow documentation extraction
  • Optional documentation flags that stress out the most important parts of the source (such as class implementation, main method name...)

Because concepts are isolated as much as possible, this file quickly grew from 134 lines in April to 334 lines in June, all entered manually (with lots of errors leading to "what is going on?", "oh no, nothing works again...").

At some point its content deserved a little bit of control not only to enforce the syntax but also to have a better view on what it contains.

So I decided it would be nice to develop an HTML view on this one.

Building HTML pages

Long story short, it all started with a simple list formatting the file content. Updating will come later. Hence a basic HTML page was created to display a table (not really responsive but this is not required for now).

Loading the JSON file using an AJAX request and iterating over its content is easy but then...

Several solution exists:

Build HTML nodes using script

Browsers now offer a complete (and standardized) API to manipulate the Document Object Model. It allows you to programmatically fill the page the same way you would do with static HTML code.

PROS
  • Fast
  • Full control on the generation
  • Can be debugged
CONS
  • Exhaustive but complex API
  • Takes more time to develop
  • Long code for simple output
  • Code is quite cryptic and hard to evolve
  • Not easily maintainable

Code sample var data = { title: "Code sample", id: "test", checked: "checked", label: "it works" }; var h1 = document.body.appendChild(document.createElement("h1")); h1.innerHTML = data.title; var input = document.body.appendChild(document.createElement("input")); input.setAttribute("id", data.id); input.setAttribute("type", "checkbox"); input.setAttribute("checked", ""); var label = document.body.appendChild(document.createElement("label")); label.setAttribute("for", data.id); label.setAttribute("title", data.label); label.innerHTML = data.label;

Further reading: Introduction to the DOM

Use a template engine

Template engines usually rely on a static description of the final output. The documented syntax proposes placeholders to represent substitution points. Depending on the engine, there might be several ways to inject the values. They are designed to be fast, offer common helpers (such as enumeration) and extensive bindings (with typing, transformation...).

PROS
  • Quite fast (depends on the engine)
  • Less code to develop
  • Easy to maintain
  • Rapid learning curve
CONS
  • Each engine has its conventions and API
  • Debugging

Mustache sample var html = Mustache.to_html(document.getElementById("tpl").innerHTML, { title: "Mustache sample", id: "test", checked: "checked", label: "it works" }); document.body.appendChild(document.createElement("div")).innerHTML = html;

where the template is defined as:

<script id="tpl" type="text/template"> <h1>{{title}}</h1> <input id="{{id}}" type="checkbox" {{checked}}> <label for="{{id}}" title="{{label}}">{{label}}</label> </script>

A quick note about the script tag with type="text/template", it is a trick that prevents the browser to actually execute the content of the script tag. However, it remains available for any custom coding.

Sample reference: mustache.js

Use a framework

To put it in a nutshell, a framework will convert any web page into a web application: it encapsulates more than just UI definition and behaviors.

I recommend reading this manifesto against frameworks, it draws the line between libraries and frameworks and offers an interesting point of view on why we should avoid frameworks to push innovations as a standard.

This being said, each framework has its own specificities but, regarding UI building, I would distinguish 2 main types:

  • Widget based frameworks (ExtJS, Open UI5...): each UI element is wrapped inside a control class. Building the interface can be done either through static descriptions (such as XML) or code.
  • HTML based frameworks (AngularJS, EmberJS...): based on HTML, it is then augmented with bindings
PROS
  • Codebase (samples, documentation...)
  • Application oriented (does more than templating)
CONS
  • Heavy
  • Long learning curve
  • May become a nightmare to debug if anything goes wrong
  • Design may look rigid

Angular sample var myApp = angular.module('myApp',[]); myApp.controller('SampleController', ['$scope', function($scope) { $scope.title = "Angular sample"; $scope.id="test"; $scope.checked=true; $scope.label="it works"; }]);

where the body is defined as:

<html ng-app="myApp"> <!-- ... --> <body ng-controller="SampleController"> <h1>{{title}}</h1> <input id="{{id}}" type="checkbox" ng-checked="checked"> <label for="{{id}}" title="{{label}}">{{label}}</label> </body> </html>

Sample reference: Angular JS

Building a simple template engine

A framework could be used but it's just too much with regards to what has to be achieved. And, obviously, mustache is appropriate but I would have missed an opportunity to learn new things.

Regarding the requirements, the expected benefits of the simple template engine are:

  • Flexible and easy way to define valid HTML
  • Simple textual bindings
  • JavaScript injection

The engine must generate a function accepting at least two parameters:

  • An object providing values for substitution
  • An index that will distinguish objects when used in an enumeration

The result will be a DOM node that can be placed anywhere (for instance, using appendChild).

In terms of syntax, the following patterns will be accepted in the template definition:

  • {{fieldName}} to be replaced with the object's field named fieldName: it can be used inside any textual content (attributes or textual nodes).
  • {% JAVASCRIPT CODE %} to inject JavaScript (with some limitations, see below)

JavaScript helpers will be provided inside the injected code to condition / alter the output:

  • $write() to output any HTML code (to be used carefully)
  • $object gives the current object
  • $index gives the current index

The checkbox case

Most of the content to generate simply consists in replacing placeholders with text coming from the object (source name, description...). It can be either as a textual node (in between elements) or as an attribute value (like for ids...).

However, an unexpected challenge appeared when it came to render boolean options.

Indeed, the simpler way to represent a boolean is to use an input with type set to checkbox.

But the checkbox will be ticked or not depending on the presence of the checked attribute, whatever its value.

So the template engine must offer a way to alter an element definition by adding attributes.

Working on the syntax, I tried different approaches. The first attempt looked like this:

<input type="checkbox" {% JAVASCRIPT CODE %}>

This one is simple however, the parsed result generates a strange HTML string:

"<input type=\"checkbox\" {%=\"\" javascript=\"\" code=\"\" %}=\"\">"

One easy way to find out this parsed result is to open the debugger, grab the handle of the parent element and ask for the innerHTML property.

Indeed, each block of characters is recognized as a single attribute.

So, I tried the following one:

<input type="checkbox" {%%}="JAVASCRIPT CODE">

And this time, the string looked good:

"<input type=\"checkbox\" {%%}=\"JAVASCRIPT CODE\">"

This also implies that the JavaScript code is correctly escaped to fit an attribute value. For instance, it may use single quotes for strings instead of double quotes.

Re-reading this part, I realize I could also use an attribute named {isChecked} and set the field isChecked with "checked" or "anything" depending if I want the checkbox to be ticked or not. However, in that case, the value has to be pre-formatted which is something I want to avoid.

The template tag

Let say you want to define a configuration file that has to be used by a JavaScript application. How would you define its syntax and content? Some may invent a dedicated API and request the file to be a valid JavaScript program. Other may specify a syntax to set the configuration in a declarative way.

Each version has its advantages and drawbacks:

  • the programmatic approach maximizes the capacities when setting the configuration (environment detection, conditions, loops...) but with a cost in terms of maintenance, compatibility and migration
  • the declarative approach simplifies the file but also gives limits to what you can do with it

In my opinion, declaration has to be preferred over implementation, that's probably why I use grunt instead of gulp. The main reason is that less code means less bugs.

When the parser already exists (the browser in our case or the JSON format for the previous example), this enforces the syntax and makes the implementation even easier.

The template element is an HTML tag that accepts any HTML content. When you access it, this DOM element exposes a content property that can be used and altered almost like any other element.

Also, you can access the innerHTML member.

Please note that this element is not supported by IE

Actually, almost any HTML element could be used the same way. However the template one has two significant advantages:

  • It is parsed but not rendered: it speeds up the loading of the page and no special handling is required to hide it
  • It accepts any HTML content: try setting innerHTML to "<tr></tr>" on a DIV element, it won't accept it.

So - after few modifications - here is the template content illustrating all features:

<body> <template id="tpl_row"> {% function check(a, b) { if ($object[a] && (b === undefined || $object[b])) { $write("checked"); } } %} <tr> <td>{{name}}</td> <td>{{description}}</td> <td><input type="checkbox" {%%}="check('load');"></td> <td><input type="checkbox" {%%}="check('load', 'test');"></td> <td><input type="checkbox" {%%}="check('doc');"></td> </tr> </template> </body>

Tokenizing

Now that we have a content, let see how we can isolate each part in order to distinguish the static text from the places where replacements are required. This process is called tokenizing.

Until recently I was not a big fan of regular expression. I was under the impression that they were slow and useless because they were only telling you if a string respects a given pattern.

Then I read the book JavaScript The Good Parts from Douglas Crockford. The chapter 7 was an eye opener. Indeed, on top of matching a pattern (and give you information about what and where), it can also extract specific information from it using capturing groups (parenthesis).

I also strongly recommend reading this website that provides valuable information about the engine.

Regarding performances, they can be efficient or very slow depending on how you write them, check the following documentation:

There are still some situations where JavaScript regexes are not appropriate. For instance, when the string you want to match is a stream you need a text-directed engine that can be interrupted. I started to implement such a mechanism (tested) in GPF-JS.

So, for each pattern, a regular expression is capable of finding it and extracting its content:

Once the three are combined with an additional default case, it gives the tokenizing pattern.

From there, the algorithm consists in matching the string through this regular expression and process each token one by one.

Don't forget to reset the lastIndex property before matching a new string.

Code generation

If you were patient enough to read this article until this part: congratulations! You have reached the funniest part of it.

The first version of the helper was returning a function that was not only tokenizing the HTML of the template but also substituting the tokens all together. Then I realized that it would be faster to tokenize the content first and then generate a function that does only the substitution job.

Hence I rewrote it to dynamically build the template function based on the template description.

I have a real fascination for the process of code generation: the outcome is most of the time faster than the traditional way of doing things because it produces highly specialized functions.

There are several ways to generate code in JavaScript, the two most commons are:

There are other ways which are more elaborated. For instance, one may also load a script from a data URL. But let's keep things simple.

In general eval is a bad idea: it merges the content within the local scope and it is an open door for unwanted code injection or unexpected side effects. Strict mode brings some security mechanisms but most linters will reject it anyway. And I agree: eval is evil.

On the other hand, the Function constructor builds the new function in an isolated scope. This is an extreme opposite because, as a result, you can't access any symbols of your app. Still, it is an open door to code injection if you don't control what you put in the function body but, at least, the impact will be limited.

Most script engines offers an access point to the main contextual object (a.k.a the top level scope), i.e. window for browsers or global in NodeJS. You may also access it by calling a function with a null scope in a non-strict environment. From there, you can access all global definitions.

I also recommend this interesting article Writing a JavaScript framework - Sandboxed code evaluation: it proposes an ES6 alternative to create real sandboxed environments.

The factory builder maintain an array of code lines (named code) to finally create the function.

The best way to explain how it works is to show the result on the samples used in the test scenarios.

The generated functions were extracted using chrome development tools and reformatted with the help of the pretty print button.

Let's start with the sample 1:

(function() { var __a = arguments , $object = __a[0] , $index = __a[1] , __r = [] , __d = document , __t = __d.createElement("template"); function __s(v) { if (undefined === v) return ""; return v.toString(); } function $write(t) { __r.push(__s(t)); } __r.push("This is a static one"); __t.innerHTML = __r.join(""); return __d.importNode(__t.content, true); } )

Because the template allows JavaScript injection, it is important to make sure that the injected code won't collide with the one that ensures the template mechanics (basically everything but the line containing "This is a static one"). I still remember this funny PHP warning about using magical __ in names so I decided to prefix the internal names with it.

So you have:

  • __a shortcut to arguments
  • __d shortcut to document
  • __r the result HTML array (array is used to speed up concatenation)
  • __t a new template element (that will receive the result HTML code)
  • __s a method that converts its parameter into a string (required for $write and basic binding)

Now if we take a look at sample 4: it introduces simple binding ($object.title and $object.content).

(function() { var __a = arguments , $object = __a[0] , $index = __a[1] , __r = [] , __d = document , __t = __d.createElement("template"); function __s(v) { if (undefined === v) return ""; return v.toString(); } function $write(t) { __r.push(__s(t)); } __r.push("<h1>;"); __r.push(__s($object.title)); __r.push("</h1>;"); __r.push(__s($object.content)); __t.innerHTML = __r.join(""); return __d.importNode(__t.content, true); } )

The pattern {{name}} is replaced with __r.push(__s($object.name));

Sample 7 illustrates the attribute version of code injection.

(function() { var __a = arguments , $object = __a[0] , $index = __a[1] , __r = [] , __d = document , __t = __d.createElement("template"); function __s(v) { if (undefined === v) return ""; return v.toString(); } function $write(t) { __r.push(__s(t)); } __r.push("<input type=\"checkbox\" "); if ($object.check) $write('checked=\'true\''); __r.push(">"); __t.innerHTML = __r.join(""); return __d.importNode(__t.content, true); } )

The code is inserted 'as-is' in the result function.

Lastly, sample 8 shows JavaScript injection to condition generation:

(function() { var __a = arguments , $object = __a[0] , $index = __a[1] , __r = [] , __d = document , __t = __d.createElement("template"); function __s(v) { if (undefined === v) return ""; return v.toString(); } function $write(t) { __r.push(__s(t)); } if ($object.condition) { __r.push("<span>"); $write("Hello"); __r.push("</span>"); } else { __r.push("<div></div>"); } __t.innerHTML = __r.join(""); return __d.importNode(__t.content, true); } )

Same solution here: the code is copied verbatim to the function body.

Testing

Of course, as a convinced practitioner of TDD, I created test scenarios to validate the most common use cases of this template library.

Conclusion

The purpose of this exercise was to try to implement a minimalist template engine that would fit my needs, I would not pretend it to be perfect nor fully functional. However, besides the implementation, I am more interested by the lessons I learned from it. My hope is that you will also learn from it.

As usual, any feedback is welcome.

Right now, I am happy with the result: the minified version of mustache.js is about 9 KB, mine would take only 1KB (using jscompress.com).

But there would be lots to add to it, such as:

  • error management (relate to the original template line if something wrong occurs when building the function)
  • two way bindings (I'd like to try that one...)
  • enumeration helpers (such as for each object property)
  • conditional helpers
  • ...

1 comment: