April 2004 – Dimitri Glazkov

While coding away in JavaScript, reshaping/augmenting your HTML code using DOM, have you ever wondered why there is no support for XPath built-in? Actually, there is — Mozilla has a pretty solid support of DOM Level 3 XPath right at your fingertips through the document.evaluate method:

document.evaluate(expression, contextNode, resolver, type, result);

You’ll find the details of implementation over at the w3.org or mozilla.org, but for starters, expression is the XPath expression string and contextNode is the DOM node that you’d like to use as a root. The rest can be (and most often will be) specified as zeros and nulls. For instance, this expression will get you all div nodes that have class attribute set to DateTime in your document:

var iterator = document.evaluate(”//div[@class='DateTime']”, document, null, 0, null);

By default, the method returns an iterator, which can be worked through like so:

while(item = iterator.iterateNext()) { // do something with item }

As you might’ve guessed, the iterator returns null once all items are exhausted. By modifying the type parameter, you can make the method return other types, such as string, boolean, number, and a snapshot. Snapshot is kind of like an iterator, except the DOM is free to change while the snapshot still exists. If you try to do the same with the iterator, it will throw an exception.

Well, I thought that it is mighty unfair that Internet Explorer does not support such functionality. I mean, you can very much do XPath in JavaScript, except it can only occur in two cases (that I know of):

1) As call to an Msxml.DOMDocument object, created using the new ActiveXObject() statement.

2) If an HTML document was generated as a result of a client-side XSL transformation from an XML file.

Neither case offers us a solution if we want to use XPath in a plain-vanilla HTML. So, I decided to right the wrong. Here is the first stab at it — a JavaScript implementation of DOM Level 3 XPath for Microsoft Internet Explorer (all zipped up for your review). Here is the sample which should run in exactly the same way in IE and Mozilla.

Now counting all links on your document is just one XPath query:

var linkCount = document.evaluate(“count(//a[@href])“, document, null, XPathResult.NUMBER_TYPE, null).getNumberValue();

So is getting a list of all images without an alt tag:

var imgIterator = document.evaluate(“//img[not(@alt)]“, document, null, XPathResult.ANY_TYPE, null);

So is finding a first LI element of al UL tags:

var firstLiIterator = document.evaluate(“//ul/li[1]“, document, null, XPathResult.ANY_TYPE, null);

In my opinion, having XPath in HTML DOM opens up a whole new level of flexibility and just plain coding convenience for JavaScript developers.

I must say, I haven’t been able to resolve all implementation issues yet. For example, I couldn’t find a pretty way to implement properties of XPathResult. How do you make a property accessor that may throw an exception in JScript? As a result, I had to fall back to the Java model of binding to properties.

So guys, take a look. I can post more on details of implementation, if you’d like. Just let me know.

Month: April 2004

XPath, unleashed — coming to Internet Explorer 5+ HTML DOM near you