Friday, January 30, 2015

Code Analysis

Sometimes, you have to put in place tools to make sure that the code respects the best practices and guidelines. Some very good ones already exist (in particular the code linters).
In this article, I will introduce you to esprima that allows a deep and precise code analysis.
### Linting is necessary... In a previous post, I have been talking about the benefits of using [code linting tools](http://gpf-js.blogspot.ca/2013/11/ why-lint-tool-can-reduce-development.html) as it really saves the developer time during the coding phase: * It highlights common mistakes (undeclared variables, unused parameters, use of [hasOwnProperty](https://developer.mozilla.org/en-US/docs/Web/JavaScript/ Reference/Global_Objects/Object/hasOwnProperty), == vs ===...) * It maintains code consistency (formatting, naming conventions...) * It checks code complexity, in particular the [cyclomatic complexity](http:// en.wikipedia.org/wiki/Cyclomatic_complexity) Regarding JavaScript, in my opinion, one of the best options is to apply [JSHint](http://www.jshint.com/). In the following example, several errors are raised because of: * quotes inconsistencies * undeclared functions (let say they are declared in other sources) * missing semicolon * undeclared or improper variable names * ... Move your mouse over the underlined errors to read them *(note that this is using live JSHint parsing)*. `(function () {/*gpf:apply-jshint*/ "use strict"; // Handles the click on a specific button function on_click (ctrl) { freezeInterface("Please wait, updating information...") $.ajax ('getInformation.aspx', { success: function (data) { resultCode = data.resultCode; if (resultCode == 0) updateInterfaceUsingData(data); } }) unfreezeInterface(); } document.getElementById('myButton') .addEventListener("click", on_click); })();` ### ...but not sufficient However, this kind of tool has limits as it tends to apply general validation rules without considering the meaning of the code that is executed. To illustrate the problem, let me first fix the JSHint errors: `(function () {/*gpf:apply-jshint*/ "use strict"; /*global $, freezeInterface, updateInterfaceUsingData, unfreezeInterface*/ // Handles the click on a specific button function onClick () { freezeInterface("Please wait, updating information..."); $.ajax("getInformation.aspx", { success: function (data) { var resultCode = data.resultCode; if (resultCode === 0) { updateInterfaceUsingData(data); } } }); unfreezeInterface(); } document.getElementById("myButton") .addEventListener("click", onClick); })();` Using the "global" comment, JSHint is configured to be aware of the global variables to be defined. With this declaration, the code successfully passes JSHint validation. But do you see any problem here? ... no? ... really? ... ### Incorrect use of the API The method **[$.ajax](http://api.jquery.com/jquery.ajax/)** comes from the [jQuery](http://jquery.com/) framework and it encapsulates the necessary code to handle AJAX calls with the server. There are two important things to remember about AJAX calls: * They are asynchronous: this is why callbacks are used to be notified when the answer comes back from the server. * Like any function call, they may generate errors: either because the API that was invoked can't provide the requested result or because a network failure prevents the call to complete. Consequently, there are two problems with the way the method **onClick** is implemented: * The use of **unfreezeInterface** is done right after the call to **$.ajax**. It means that it does not even wait for the call to succeed. * Furthermore, if the call fails, nothing will happen. The error is ignored. There are some situations where it is acceptable to ignore errors but, in a general manner, all errors must be handled. ### Understanding the code To detect such a misconception, one must first locate any function call to **$.ajax** and then analyze the parameters to verify that it has been used the right way. One easy solution could be to consider the whole source as a big string, look for "$.ajax" and then check that the "success" as well as the "error" keywords appear just after. But how reliable is that? `(function () {/*gpf:apply-jshint*/ "use strict"; /*global $, freezeInterface, updateInterfaceUsingData, unfreezeInterface*/ /*global showError*/ // Handles the click on a specific button function onClick () { freezeInterface("Please wait, updating information..."); $.ajax("getInformation.aspx", { success: function (data) { var resultCode = data.resultCode; if (resultCode === 0) { updateInterfaceUsingData(data); } else { // The 'error' keyword is here but inside a string showError("An error occurred"); } } }); unfreezeInterface(); } document.getElementById("myButton") .addEventListener("click", onClick); })();` In the above example, the "error" keyword appears in a comment and a string. Furthermore, how can you make sure that no instruction is executed after the call to **$.ajax**? ### Parsing the code The only good way to understand the JavaScript code is to parse the source in order to build a structured representation of the instructions it contains. Consequently, the **$.ajax** would appear as a function call and it would be possible to enumerate and check its parameters. In particular, we could verify that the second one is an object with a member named "error". JavaScript parsing is a wide topic... In my [library](https://github.com/ ArnaudBuchholz/gpf-js), I created a [tokenizer](https://github.com/ ArnaudBuchholz/gpf-js/blob/master/tokenizer.js) that is used in this website to apply syntax coloring but it does not really provides the program structure: it just identifies the keywords, strings, comments and symbols. ### esprima: a free JavaScript parser Fortunately, there are existing libraries that do the job for you. I will focus on [esprima](http://esprima.org/index.html). To make a long story short, esprima is a JavaScript library that parses JavaScript sources. It generates a [Sensible syntax tree](http://esprima.org/ doc/index.html#ast) format, compatible with [Mozilla Parser AST](https:// developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/Parser_API). The previous example is converted into: `{ "type": "Program", "body": [ { "type": "ExpressionStatement", "expression": { "type": "CallExpression", "callee": { "type": "FunctionExpression", "id": null, "params": [], "defaults": [], "body": { "type": "BlockStatement", "body": [ { "type": "ExpressionStatement", "expression": { "type": "Literal", "value": "use strict", "raw": "\"use strict\"" } }, { "type": "FunctionDeclaration", "id": { "type": "Identifier", "name": "onClick" }, "params": [], "defaults": [], "body": { "type": "BlockStatement", "body": [ { "type": "ExpressionStatement", "expression": { "type": "CallExpression", "callee": { "type": "Identifier", "name": "freezeInterface" }, "arguments": [ { "type": "Literal", "value": "Please wait, updating information...", "raw": "\"Please wait, updating information...\"" } ] } }, { "type": "ExpressionStatement", "expression": { "type": "CallExpression", "callee": { "type": "MemberExpression", "computed": false, "object": { "type": "Identifier", "name": "$" }, "property": { "type": "Identifier", "name": "ajax" } }, "arguments": [ { "type": "Literal", "value": "getInformation.aspx", "raw": "\"getInformation.aspx\"" }, { "type": "ObjectExpression", "properties": [ { "type": "Property", "key": { "type": "Identifier", "name": "success" }, "value": { "type": "FunctionExpression", "id": null, "params": [ { "type": "Identifier", "name": "data" } ], "defaults": [], "body": { "type": "BlockStatement", "body": [ { "type": "VariableDeclaration", "declarations": [ { "type": "VariableDeclarator", "id": { "type": "Identifier", "name": "resultCode" }, "init": { "type": "MemberExpression", "computed": false, "object": { "type": "Identifier", "name": "data" }, "property": { "type": "Identifier", "name": "resultCode" } } } ], "kind": "var" }, { "type": "IfStatement", "test": { "type": "BinaryExpression", "operator": "===", "left": { "type": "Identifier", "name": "resultCode" }, "right": { "type": "Literal", "value": 0, "raw": "0" } }, "consequent": { "type": "BlockStatement", "body": [ { "type": "ExpressionStatement", "expression": { "type": "CallExpression", "callee": { "type": "Identifier", "name": "updateInterfaceUsingData" }, "arguments": [ { "type": "Identifier", "name": "data" } ] } } ] }, "alternate": { "type": "BlockStatement", "body": [ { "type": "ExpressionStatement", "expression": { "type": "CallExpression", "callee": { "type": "Identifier", "name": "showError" }, "arguments": [ { "type": "Literal", "value": "An error occurred", "raw": "\"An error occurred\"" } ] } } ] } } ] }, "rest": null, "generator": false, "expression": false }, "kind": "init", "method": false, "shorthand": false } ] } ] } }, { "type": "ExpressionStatement", "expression": { "type": "CallExpression", "callee": { "type": "Identifier", "name": "unfreezeInterface" }, "arguments": [] } } ] }, "rest": null, "generator": false, "expression": false }, { "type": "ExpressionStatement", "expression": { "type": "CallExpression", "callee": { "type": "MemberExpression", "computed": false, "object": { "type": "CallExpression", "callee": { "type": "MemberExpression", "computed": false, "object": { "type": "Identifier", "name": "document" }, "property": { "type": "Identifier", "name": "getElementById" } }, "arguments": [ { "type": "Literal", "value": "myButton", "raw": "\"myButton\"" } ] }, "property": { "type": "Identifier", "name": "addEventListener" } }, "arguments": [ { "type": "Literal", "value": "click", "raw": "\"click\"" }, { "type": "Identifier", "name": "onClick" } ] } } ] }, "rest": null, "generator": false, "expression": false }, "arguments": [] } } ] }` In particular, the **$.ajax** call is: `// [...] { "type": "ExpressionStatement", "expression": { "type": "CallExpression", "callee": { "type": "MemberExpression", "computed": false, "object": { "type": "Identifier", "name": "$" }, "property": { "type": "Identifier", "name": "ajax" } }, "arguments": [ { "type": "Literal", "value": "getInformation.aspx", "raw": "\"getInformation.aspx\"" },` And, at the same level, you find the call to **unfreezeInterface** call: `// [...] { "type": "ExpressionStatement", "expression": { "type": "CallExpression", "callee": { "type": "Identifier", "name": "unfreezeInterface" }, "arguments": [] } }` So it is possible to write a program that uses esprima to parse the JavaScript source and then analyze the result structure to do the necessary checks. ### verify.js So I created the program [verify.js](https://github.com/ArnaudBuchholz/ ArnaudBuchholz.github.io/blob/master/blog/post/Code%20analysis/verify.js) that relies on esprima to parse a file and locate the two problems described above. The main algorithm relies on the structure exploration done in the function [walk](https://github.com/ArnaudBuchholz/ArnaudBuchholz.github.io/blob/ 65e9bf2164a3081a90f92fe2ff4805479880ea6b/blog/post/Code%20analysis/ verify.js#L71) and, once the **$.ajax** structure is detected, the function [checkAjaxCallbacks](https://github.com/ArnaudBuchholz/ArnaudBuchholz.github.io/ blob/65e9bf2164a3081a90f92fe2ff4805479880ea6b/blog/post/Code%20analysis/ verify.js#L36) takes care of checking the parameters. To use it, first get it from the GitHub repository (use the [raw](https://raw.githubusercontent.com/ArnaudBuchholz/ ArnaudBuchholz.github.io/master/blog/post/Code%20analysis/verify.js) button). Supposing you have [nodeJs](http://nodejs.org/) installed, open a command prompt and type: * npm install gpf-js * npm install esprima ![Setup](https://arnaudbuchholz.github.io/blog/post/Code%20analysis/setup.png) Then you are ready to go. Type "node verify" ![Help](https://arnaudbuchholz.github.io/blog/post/Code%20analysis/help.png) Or download the two samples and test them: * [sample1.js](https://arnaudbuchholz.github.io/blog/post/Code%20analysis/ sample1.js) ![Sample 1](https://arnaudbuchholz.github.io/blog/post/Code%20analysis/ sample1.png) * [sample2.js](https://arnaudbuchholz.github.io/blog/post/Code%20analysis/ sample2.js) ![Sample 2](https://arnaudbuchholz.github.io/blog/post/Code%20analysis/ sample2.png) ### To conclude JavaScript linting with JSHint is necessary. However, you might need a more advanced tool that is capable of understanding the meaning of the algorithm to get a deeper validation of your code. With this article, I tried to demonstrate only a fragment of what can be done with the AST structure generated by esprima. But once the door is opened: * You can also check function signatures or dependencies, keep track of closures, handle variable types... * You may also think of generating statistics, documentation... * Why not modifying the AST structure in order to manipulate the code and generate a modified version of it: have a look on [escodegen](https://github.com/estools/escodegen).

Wednesday, January 7, 2015

Timeout and WebWorker

The context

I am currently working on a complex JavaScript application divided into several layers. On one hand, the communication with the server is centralized in a dedicated framework that extensively uses asynchronous executions sequenced by promises. On the other hand, the application is contained inside a SPA and is designed to provide the best user experience.

Promises

To make a long story short, a promise represents the result of an asynchronous operation. It can be:

  • pending (by default)
  • fulfilled (i.e. succeeded)
  • or rejected (i.e. the operation failed)

The most common usage is to wait for the operation completion and chain it with a callback. This is done by calling the method then with one parameter: the function to call on completion.

For instance: // An example of function that uses a promise to signal its completion function myPromiseBasedFunction() { var promise = new Promise(); /* Execute the content of the function and signal the completion using promise.resolve(). This can be asynchronous (for instance, using an AJAX call) or synchronous. The caller does not need to know. */ return promise; } // An example of use: myPromiseBasedFunction().then(function() { // Triggered only when the promise is fulfilled (i.e. resolve was called) });

This offers a convenient way to write asynchronous code that can be chained easily. After creating the promise, the execution starts and even if the result is available immediately (or synchronously), the promise allows you to register the callback function before it signals the completion.

Hence, to work appropriately, a promise must defer its completion execution to let the caller build first the chain of callbacks.

Promise asynchronous
Promise asynchronous

Promise synchronous
Promise synchronous

The only reliable way to implement such a code sequence is to use the setTimeout function in order to have the resolve method implementation be called after the registration of the callback function (i.e. calls of method then).

Responsive user interface

As any user interface, the design is polished to provide the best user experience possible. It means that long JavaScript operations are split into chunks and sequenced using setTimeout to prevent any interface freeze and get rid of the annoying long-script running message.

Long running example in Chrome
Long running example in Chrome

This long running sampler will allow you to see this dialog. Enter a number of seconds to wait for (usually 100s) and click go!

setTimeout in an inactive tab

Chrome and FireFox share a particularity that I discovered when using several tabs (later, I found that Opera was doing the same but Internet Explorer and Safari are safe). At some point, the application appeared to be 'frozen' when the tab was not active.

For instance, have a look to the following example page. It was designed to print the current time every 100 milliseconds both in the page title and in the page content. If the tab is not active, you will notice that it seems to refresh slower (nearly every second). I also added a real-time monitor that displays red dots if the interval between two calls is greater than 120 ms.

You can find good explanations of the reasons why as well as some possible workarounds by crawling the web:

As it seems that the setTimeout function works fine in a Web Worker thread, I decided to explore this possibility.

Timeout project

I created a new GitHub repository and started to work on a small library that would solve the issue.

Skeleton

This script is based on an immediately-invoked function expression. It has two advantages:

  • A private scope for variables and functions
  • When invoked, this is translated into the parameter self to provide the global context object

Hooking the APIs

First of all, the JavaScript language is really flexible as it relies on late binding. It means that every time you call a function using a name, the name is first resolved (as any variable) to get the function object.

for instance: /* In the following example, the JavaScript engine must: - first resolve the variable "gpf" - the member "bin" on the previous result - the member "toHexa" on the previous result This last result is considered as a function and called with the parameter 255 */ var result = gpf.bin.toHexa(255); // result is FF

In a browser, the main context object (the one that contains everything) is the window object. Keep that information in mind, this will be important for the next part.

Hence it is possible to redefine the setTimeout function by assigning a new function to the variable window.setTimeout. I decided to cover the whole timer API, that's why I redefined the followings:

Creating a Web Worker

To create an HTML5 Web Worker you need several things:

WARNING: the same-origin policy applies to this URL, you should read the documentation
  • A JavaScript code to create it: var worker = new Worker("source.js");
  • A way to communicate with the worker (will be covered in the next part)

Regarding the URL to load, one challenge that I started with is that I wanted the same script to be used not only to redefine the APIs in the main context but also to implement the Web Worker (so that only one file must be distributed). But it is impossible for a script to know how it has been loaded as you don't have its 'current' URL. So I created the method _getTimeoutURL to extract this URL:

  • It checks if a script tag with the id "timeout" exists
  • Or it checks all script tags for the one finishing with "timeout.js"
  • Or it returns "timeout.js"

Regarding the worker creation, the same script is used for the main context as well as the web worker. So I needed a way to distinguish the two situations. This is where the window object can help. Indeed, a worker thread can't access it: the worker object itself is the global context of the thread. That explains why the distinction is made by checking the window typeof.

Main thread / WebWorker communication

The communication between the main thread and the web worker is based on messages: they are asynchronous by nature.

Unless you start messing with the transferList parameter, you can only transmit types that are convertible to a JSON representation.

(This is a highly simplified truth. To be exact, HTML5 introduces the notion of structured clone algorithm used for serializing complex objects.)

To receive messages, you must register on the "message" event using addEventListener

Other implementation details

To make a long story short, every time you call setTimeout or setInterval, a new record is created in the corresponding dictionary (_timeouts or _intervals) to store the parameters of the call.

Its key is a number that is allocated (incremented) from _timeoutID or _intervalID.

Then a message is sent to the worker thread to execute the timeout function: only the key and the delay are transmitted.

On timeout, the worker sends back a message with the key to the main thread which retrieves the parameters and executes the callback.

setTimeout sequence
setTimeout sequence

Possible improvements

Several aspects of this implementation can be improved:

  • Startup time: sometimes, the web worker requires several seconds to run. Because of that, all timeouts may be delayed more than necessary during this phase. An improvement would consist in switching to the new API only when the new thread is ready.
  • URL to load: digging on the net, I found a sample where the web worker was initialised using a data: URL. This clearly reduces the dependency with the source script but, then, we need a bootstrap to load the code inside the web worker.

Conclusion

It works and, more important, without modifying the original code! please check the following example page with fix (and don't forget to switch tab).