Home · Overviews · Examples 


A Short Path to XQuery

XQuery is a language for querying XML data or non-XML data that can be modeled as XML. XQuery is specified by the W3C.

Introduction

Unlike statement-based languages like Java and C++, the
XQuery language is expression-based. The simplest XQuery expression is an XML node constructor: This XML node constructor is an XQuery expression that forms a complete query. It is also a well-formed XML document. The example shows that an XML node can be used as an XQuery expression. An XQuery expression can be embedded in another XQuery expression in curly braces. This example shows a document expression embedded in a node expression:

Using Path Expressions To Match & Select Items

In C++ and Java, we write nested for loops and recursive functions to traverse XML trees in search of elements of interest. In
XQuery, we express iterative and recursive algorithms using path expressions.

A path expression looks somewhat like a typical file pathname for locating a file in a hierarchical file system. It is a sequence of one or more steps separated by slash '/' or double slash '//'. Although path expressions are used for traversing XML trees, not file systems, in QtXmlPatterms we can model a file system to look like an XML tree, so in QtXmlPatterns we canus XQuery to traverse a file system. See the file system example.

Think of a path expression as an algorithm for traversing an XML tree to find and collect items of interest. This algorithm is evaluated by evaluating each step (sub-expression) moving from left to right through the sequence. A sub-expression is evaluated with a set of input items (nodes and atomic values), sometimes called the focus. The sub-expression is evaluated for each item in the focus. These evaluations produce a new set of items, called the result, which then becomes the focus that is passed into the next step. Evaluation of the final step produces the final result, which is the result of the XQuery. The items in the result set are presented in document order and without duplicates.

With QtXmlPatterns, a standard way to present the initial focus to a query is to call QXmlQuery::setFocus(). Another common way is to let the XQuery itself create the initial focus by letting the first step of the path be a call to the XQuerydoc() function, which loads an XML document and returns its document node. The document node is the top-level node that represents the entire XML document (i.e., the document node is not the outermost XML element in the document). The document node then becomes the singleton element in the initial focus set. Consider the following XQuery: The doc() function loads index.html to get its document node. The document node becomes the focus for the next sub-expression p. The double slash says to select all p elements found below the document node, regardless of where they appear in the document tree. This query selects all p elements in the document index.html.

Conceptually, evaluation of a path expression is similar to a set of nested for loops. Consider the following XQuery, which builds on the previous one: This XQuery is a single path expression composed of three steps. The first step is the filter step, which creates the initial focus set by calling the doc() function. We can paraphrase what the query engine does with the focus set at each step:

  1. for each node in the initial focus set...
  2. for each descendant node that is an element named p...
  3. collect the child nodes that are elements named span.
Again the double slash means select all the p elements in the document. The single slash before the span element means select only those span elements that are child elements of a p element (i.e. not grandchildren, etc). The XQuery evaluates to a final result set containing all the span elements in the document that have a p element as parent.

Axis Steps

The most common kind of path step is called an axis step, which tells the query engine which way to navigate from the context node, and which test to perform when it encounters nodes along the way. An axis step has two parts, an axis specifier, and a node test. Conceptually, evaluation of an axis step proceeds as follows: For each node in the focus set, the query engine navigates out from the node along the specified axis and applies the node test to each node it encounters. The nodes selected by the node test are collected in the result set, which becomes the focus set for the next step.

The second and third steps in the example XQuery above are both axis steps. Both will apply the element() node test to nodes encountered along some axis. But in the example, the two axis steps are written using an abbreviated syntax, where the axis specifier and the node test are not written but implied. XQueries are normally written in this shorthand form. If we rewrite the XQuery using the unabbreviated syntax, it looks like this: The two axis steps have been expanded. The first step (//p) has been rewritten as /descendant-or-self::element(p), where descendant-or-self:: is the axis specifier, and element(p) is the node test. The second step (/span) has been rewritten as /child::element(span), where child:: is the axis specifier, and element(span) is the node test. The output of the expanded XQuery will be exactly the same as the output of the shorthand form.

To create an axis step, concatenate an axis specifier and a node test. The following sections list the axis specifiers and node tests that are supported in QtXmlPatterns.

Axis Specifiers

An axis specifier defines the direction you want the query engine to take, when it navigates away from the context node. QtXmlPatterns supports the following axes.
Axis Specifier
refers to the axis containing...
self:: the context node itself
attribute:: all attribute nodes of the context node
child:: all child nodes of the context node (not attributes)
descendant:: all descendants of the context node (children, grandchildren, etc)
descendant-or-self:: all nodes in descendant + self
parent:: the parent node of the context node, or empty if there is no parent
ancestor:: all ancestors of the context node (parent, grandparent, etc)
ancestor-or-self:: all nodes in ancestor + self
following:: all nodes in the tree containing the context node, not including descendant, and that follow the context node in the document
preceding:: all nodes in the tree contianing the context node, not including ancestor, and that precede the context node in the document
following-sibling:: all children of the context node's parent that follow the context node in the document
preceding-sibling:: all children of the context node's parent that precede the context node in the document

Node Tests

A node test is a conditional expression that must be true for a node if the node is to be selected by the axis step. The conditional expression can test just the kind of the node, or it can test the kind of the node and the name of the node. The
XQuery specification for node tests also defines a third condition, the node's Schema Type, but schema type tests are not yet supported in QtXmlPatterns.

QtXmlPatterns supports the following node tests. The tests that have a name parameter also test the node's name and are often called the Name Tests.

Node Test
matches any...
node() node
text() text node
comment() comment node
element() element (same as star: *)
element(name) element called name
attribute() attribute
attribute(name) attribute called name
processing-instruction() processing-instruction
processing-instruction(name) processing-instruction called name
document-node() document node
document-node(element(name)) document node with document element name
The name in a name test is resolved to its expanded form using the statically known namespaces in the expression context. Resolving a name to its expanded form means resolving its namespace prefix (if any) to a namespace URI. The expanded name then consists of the namespace URI and the local name. When QtXmlPatterns expands a name, it creates an instance of QXmlName, which retains the namespace prefix along with the namespace URI and the local name. Consider a simple example:
declare namespace s = "http://www.w3.org/2000/svg";
let $doc := doc('image.svg')
return $doc/s:svg
It declares the namespace prefix s to represent the namespace http://www.w3.org/2000/svg. This namespace is added to the list of statically known namespaces so it can be used when resolving the namespace prefix of s:svg. The query can then match the document element from the SVG file image.svg.

We could instead declare the same namespace to be the default element namespace, and rewrite the query this way:

declare default element namespace "http://www.w3.org/2000/svg";
let $doc := doc('image.svg')
return $doc/svg
It also returns the document element from file image.svg. But consider a third version:
let $doc := doc('image.svg')
return $doc/svg
This query fails. It does not return the file's document element because, as we saw from the examples above, the document element in an SVG file is in the http://www.w3.org/2000/svg namespace, and in this version, the query does not declare that namespace with either of the forms demonstrated.

Names and Wildcards

Names can be combined with wildcards in order to select for instance any element or attribute as long as it is in a particular namespace, or an attribute or element appearing in any namespace, as long as it has a particular local name. This is achieved by using a wildcard as the prefix or local name. For instance this query: selects all the attributes that are in the
XLink namespace, and this query: selects an element whose local name is html, regardless of its namespace.

The following are various kind tests.

Abbreviated syntax

For many common tasks the full axis step syntax is a bit verbose and for that reason simplified alternatives exists, which typically combine axes and node tests. Some examples:

More on Focus and Filtering: Predicates

In addition to steps as a way to filter content,
XPath & XQuery has the predicate expression: an expression with a second expression to its right enclosed in brackets. For instance this query: selects the paragraph that has an attribute with the ID thatSpecialOne.

Like steps in path expressions, predicates also make use of the focus. For each item in the source sequence, the predicate is applied, and if the item passes the filter, it is part of the result. Inside a predicate (and inside steps too) the current context item can be accessed by using the dot expression. Consider this query: For each p element that the node test returns, the predicate is invoked. If the predicate expression evaluates to true, it returns the node, and that it will do if the string value of the predicate's context item is zero.

There are two kinds of predicates: numeric predicates and truth predicates.

Select based on Positions and Numeric Ranges

While a predicate is applied to its focus, the current contextposition can be obtained by using the function position(). For instance, this query: selects all the paragraphs except the five first.

In addition to position(), the function last() also returns a number related to the focus: the position of the last item. last() inside a predicate by itself will simply select the last item in the input sequence, but it can also be combined with for instances an offset: which would return the next last paragraph in the document.

Positions inside a focus starts from one, not zero.

Filterting based on Logic

If a predicate doesn't evaluate to a number, it is considered a truth predicate. A truth predicate takes the value the predicate expression evaluates to and computes its effectivebooleanvalue. The rules for how that is done, is as follows: For instance, this query: selects all paragraphs that has a table as a child, since the predicate evaluates to true if the contained step, table, matches any nodes. This is of course very different from: which returns the tables found inside paragraphs (which should be none, since they cannot appear there).

Creating nodes

While the
XQuery language has a lot of functions and expression for selecting and filtering exisitng content, it can also create new content using its node constructors. Consider: While this looks like an XML document, and in fact is so, it also is a valid XQuery query. Node constructors are by large just like XML, so if one knows XML, one can simply continue to write XML inside queries whenever one needs to have nodes created. There is however two things that set direct node constructs apart from XML: one can embed XQuery expressions inside of them, and they are expressions themselves. Let's first look at the former.

Computing values inside nodes

Creating a value inside a node at runtime is done by embedding expressions inside curly braces. For instance, this expression, simple as it is, constructs an element with the text node "6" inside of it: Similarly, one can embed expressions inside attributes. For instance: creates an element whose attribute called class has the value "important example obsolete", without quotes.

Node Constructors are Expressions

Because node constructors are expressions just like for instance function calls, paths and literals, they can appear anywhere where expressions can appear. For instance: If maybeNotWellformed.xml can be read successfully it creates a para element for each p element that appears anywhere in the document and copies p's child nodes into it. But if the document cannot be loaded, a single para element is created that contains a descriptive message.

Hence, in the above query node constructors appear in two places:

Computing Node Names at Runtime

Direct node constructors are fine, but what if one doesn't know the names of the nodes to construct when writing the query? For each direct element constructor, there exist a computed node constructor, that takes names and the node values as arbitrary expressions. For instance, the query seen above that produced a small XML document, can also be written like this:

Copying nodes into other nodes

When an expression embedded inside a node expression evaluates to strings (or any other type of atomic values) the values becomes text nodes by concatenating them with a space inbetween. However, when the expression evaluates to nodes, they are copied and becomes children of the surrounding node. This can occasionally be deceptive. Consider this query: This won't output a p element that has the value of the version attribute, it will instead copy the attribute onto the p element whose result in not even valid XHTML. The approach is instead, in the case of wanting the value of the attribute instead of itself, to extract that using for instance the string() function:

Escaping Characters

In the
XQuery syntax, a set of characters are given special meaning. For instance, apostrophes or quotes start and terminate string literals. These can be escaped by writing the character twice:

For instance: Evaluates to: However, sometimes the easiest is to start the string literal with apostrophes instead of quotes, if the string contains quotes. One can also use XML character references, like & or , to express characters that cannot be directly represented in the encoding of the file containing the query.

When curly braces should appear inside node constructors, one can again escape them with double braces or use character references. For instance:

Dates, Times, Numbers and other Atomic Values

Apart from nodes, XQuery has atomic values and they are just what one would think they are: small, primitive values, that have a similar role to C++'s plain old data structures like float or long. In total there are about twenty of them, some of the most common being:
Name
Description
xs:integer A 64 bit integer
xs:boolean A boolean value, false or true
xs:double A 64 bit floating point value
xs:string A string where each codepoint is an XML 1.0 character(essentially Unicode)
xs:date A date, such as when you're born: 1984-10-15
xs:time A time, such as when you show up at work: 09:00:00
xs:dateTime A date followed by a time: 1974-10-15T05:00:00
xs:duration A time interval such as P5Y2M10DT15H, which represents five years, two months, 10 days, and 15 hours.
xs:base64Binary Represents data, possibly binary data, in Base 64 encoding.
Atomic values can be seen as types which have:

Creating Atomic Values

Apart from the builtin functions that returns atomic values, such as current-date-time(), constructor functions can be used to

Integers, decimals, doubles and strings can be created by using literal expressions. Booleans with the functions true() or false() (just true or false would be name tests), and for the rest constructor functions must be used.

Essentially each atomic type can construct a value from a string. While doing so it validates the input string to ensure it has a proper format and if not, it issues a dynamic error. These formats tend to be as one would guess it to be. For instance, if one passes "1.five" to xs:decimal's constructor, as opposed to "1.5" it will halt the query such that the bug can be corrected.

In the example an xs:boolean was created from an xs:integer as opposed to from a string, and that's because values doesn't have to be constructed from strings, they can be created, or converted, from a range of different types. For instance, an xs:double can be created from a xs:decimal, or a xs:boolean can be converted to an xs:string. What conversions that are possible depends on the types but they tend to be intuitive. One of the specifications has a nifty table outlining those.

Using Atomic Values

Once atomic values have been constructed, via one of the methods mentioned above, or as return values from functions or by evaluating variables, one can pass them along to functions, convert them to strings and attach them as part of text nodes to nodes, or use operators between them. Let's look at the latter.

Apart from the usual arithmetic operators between numbers one would expect, they are also available between more exotic types. Have a look at this query:snippets/patternist/literalsAndOperators.xq- It substracts two dates which returns an xs:dayTimeDuration, which it subsequently compares against another :xs:dayTimeDuration. The query finally evaluates to a single atomic value of type xs:boolean.

The available operators and between what types are summarized in a table in the main XQuery specification.

Further Reading

XQuery is a big language that is hard to cover in an overview. If one wants a good understanding of the subject, a good thing could be to get a book on topic.

Another alternative is to ask a question or two on the mailing lists qt-interest or talk at x-query.com.

FunctX is a collection of XQuery functions that can be both useful and educational.

Of course, the specifications is one alternative, but one has to take a deep breath before diving into those. Here are the links to (some of) them:

FAQ

Path expressions misses

Often this is caused by that the names that the axis step matches, is different from nodes being matched. For instance, let's say that index.html in this query: is an XHTML document and hence it resides in the namespace http://www.w3.org/1999/xhtml/. The path won't match since they look for {}html and so forth, while the actual name is {http://www.w3.org/1999/xhtml/}html. The fix is straight forward: Path expressions also pick up the default namespace if one is declared: In this case the nodes created by the direct element constructors will be in the XHTML namespace, but so will the path expressions. Hence they look for {http://www.w3.org/1999/xhtml/}tests and so forth, while testResult.xml is perhaps in a different namespace, or no namespace at all.

Another reason coulbe be that the context item is not what one expects it to be. For instance, this expression: won't match because the node the doc() function returns is not the top element node(html), it is the document node.

Variable in for loop is out of scope

Due to expression precedence it might be necessary to wrap the return expression in a for clause with paranteses: Without the paranteses on the last line, the arithmetic expression would have had the whole for clause as it left operand, and since the scope of variable $d ends at the return clause, the variable reference would be out of scope.

Expressions aren't evaluated

If an expression is inside a node constructor it must be surrounded by curly braces, otherwise it's interpreted as text. This: evaluates to: while: evaluates to:

Filters selects the wrong things

When having predicates, consider what the predicate applies to. For instance, this query: evaluates to the first span elements in each p element, while this query: evaluates to only one span element, the one that occured first in the result of the path expression as a whole. In the first case the filter expression was applied for the span step.

FLWOR doesn't behave as expected

Note that a for expression generates a so called tuple stream, while a let clause is an ordinary variable binding. For instance, if a let binding is placed inside a for binding it is created for each tuple. The order by clause in turn applies to the result of the tuple stream that the for clause evaluates to. Consider: evaluates to 4 2 -2 2 -8 2.

This expression: wouldn't be sorted since the items the let clause binds aren't dealt with on an individual basis.

Nodes are created in the wrong order

If nodes are created in the wrong order, it can possibly be related to that the document order between nodes created with node constructors is undefined. For that reason node sorting, which is invoked by path expressions for instance, returns nodes in an order which is undefined. Hence, one gets nodes in an arbitrary order if node constructors are placed somewhere in a path expression; or indirectly, if nodes are created inside a user-declared function which is called from a path step. Consider: This query evaluates to a sequence of p elements. However, the order is not in the same order as the item elements appear in feed.rss. The order is, counter intuitive as it may seem, undefined.

One approach to this is to instead use the for loop, which doesn't perform node sorting on its result:

true or false doesn't work

Boolean values, that is atomic values of type xs:boolean, cannot be created by writing true or false inside the query, since those are steps, name tests to be precise. The safest and easiest way to create boolean values is to use the builtin functions false() or true().

Another way is to invoke its constructor function:


Copyright © 2008 Trolltech Trademarks
Qt Jambi 4.4.0_01