![]() |
Home · Overviews · Examples |
XQuery is a language for querying XML data or non-XML data that can be modeled as XML. XQuery is specified by the W3C.
A path expression looks somewhat like a typical file pathname for locating a file in a hierarchical file system. It is a sequence of one or more steps separated by slash '/' or double slash '//'. Although path expressions are used for traversing XML trees, not file systems, in QtXmlPatterms we can model a file system to look like an XML tree, so in QtXmlPatterns we canus XQuery to traverse a file system. See the file system example.
Think of a path expression as an algorithm for traversing an XML tree to find and collect items of interest. This algorithm is evaluated by evaluating each step (sub-expression) moving from left to right through the sequence. A sub-expression is evaluated with a set of input items (nodes and atomic values), sometimes called the focus. The sub-expression is evaluated for each item in the focus. These evaluations produce a new set of items, called the result, which then becomes the focus that is passed into the next step. Evaluation of the final step produces the final result, which is the result of the XQuery. The items in the result set are presented in document order and without duplicates.
With QtXmlPatterns, a standard way to present the initial focus to a query is to call QXmlQuery::setFocus(). Another common way is to let the XQuery itself create the initial focus by letting the first step of the path be a call to the XQuerydoc() function, which loads an XML document and returns its document node. The document node is the top-level node that represents the entire XML document (i.e., the document node is not the outermost XML element in the document). The document node then becomes the singleton element in the initial focus set. Consider the following XQuery: The doc() function loads index.html to get its document node. The document node becomes the focus for the next sub-expression p. The double slash says to select all p elements found below the document node, regardless of where they appear in the document tree. This query selects all p elements in the document index.html.
Conceptually, evaluation of a path expression is similar to a set of nested for loops. Consider the following XQuery, which builds on the previous one: This XQuery is a single path expression composed of three steps. The first step is the filter step, which creates the initial focus set by calling the doc() function. We can paraphrase what the query engine does with the focus set at each step:
The second and third steps in the example XQuery above are both axis steps. Both will apply the element() node test to nodes encountered along some axis. But in the example, the two axis steps are written using an abbreviated syntax, where the axis specifier and the node test are not written but implied. XQueries are normally written in this shorthand form. If we rewrite the XQuery using the unabbreviated syntax, it looks like this: The two axis steps have been expanded. The first step (//p) has been rewritten as /descendant-or-self::element(p), where descendant-or-self:: is the axis specifier, and element(p) is the node test. The second step (/span) has been rewritten as /child::element(span), where child:: is the axis specifier, and element(span) is the node test. The output of the expanded XQuery will be exactly the same as the output of the shorthand form.
To create an axis step, concatenate an axis specifier and a node test. The following sections list the axis specifiers and node tests that are supported in QtXmlPatterns.Axis Specifiers
An axis specifier defines the direction you want the query engine to take, when it navigates away from the context node. QtXmlPatterns supports the following axes.
self:: | the context node itself |
attribute:: | all attribute nodes of the context node |
child:: | all child nodes of the context node (not attributes) |
descendant:: | all descendants of the context node (children, grandchildren, etc) |
descendant-or-self:: | all nodes in descendant + self |
parent:: | the parent node of the context node, or empty if there is no parent |
ancestor:: | all ancestors of the context node (parent, grandparent, etc) |
ancestor-or-self:: | all nodes in ancestor + self |
following:: | all nodes in the tree containing the context node, not including descendant, and that follow the context node in the document |
preceding:: | all nodes in the tree contianing the context node, not including ancestor, and that precede the context node in the document |
following-sibling:: | all children of the context node's parent that follow the context node in the document |
preceding-sibling:: | all children of the context node's parent that precede the context node in the document |
QtXmlPatterns supports the following node tests. The tests that have a name parameter also test the node's name and are often called the Name Tests.
node() | node |
text() | text node |
comment() | comment node |
element() | element (same as star: *) |
element(name) | element called name |
attribute() | attribute |
attribute(name) | attribute called name |
processing-instruction() | processing-instruction |
processing-instruction(name) | processing-instruction called name |
document-node() | document node |
document-node(element(name)) | document node with document element name |
declare namespace s = "http://www.w3.org/2000/svg"; let $doc := doc('image.svg') return $doc/s:svgIt declares the namespace prefix s to represent the namespace http://www.w3.org/2000/svg. This namespace is added to the list of statically known namespaces so it can be used when resolving the namespace prefix of s:svg. The query can then match the document element from the SVG file image.svg.
We could instead declare the same namespace to be the default element namespace, and rewrite the query this way:
declare default element namespace "http://www.w3.org/2000/svg"; let $doc := doc('image.svg') return $doc/svgIt also returns the document element from file image.svg. But consider a third version:
let $doc := doc('image.svg') return $doc/svgThis query fails. It does not return the file's document element because, as we saw from the examples above, the document element in an SVG file is in the http://www.w3.org/2000/svg namespace, and in this version, the query does not declare that namespace with either of the forms demonstrated.
The following are various kind tests.Abbreviated syntax
For many common tasks the full axis step syntax is a bit verbose and for that reason simplified alternatives exists, which typically combine axes and node tests. Some examples: More on Focus and Filtering: Predicates
In addition to steps as a way to filter content, XPath & XQuery has the predicate expression: an expression with a second expression to its right enclosed in brackets. For instance this query: selects the paragraph that has an attribute with the ID thatSpecialOne.
Like steps in path expressions, predicates also make use of the focus. For each item in the source sequence, the predicate is applied, and if the item passes the filter, it is part of the result. Inside a predicate (and inside steps too) the current context item can be accessed by using the dot expression. Consider this query: For each p element that the node test returns, the predicate is invoked. If the predicate expression evaluates to true, it returns the node, and that it will do if the string value of the predicate's context item is zero.
There are two kinds of predicates: numeric predicates and truth predicates. In addition to position(), the function last() also returns a number related to the focus: the position of the last item. last() inside a predicate by itself will simply select the last item in the input sequence, but it can also be combined with for instances an offset: which would return the next last paragraph in the document. Positions inside a focus starts from one, not zero. Hence, in the above query node constructors appear in two places: For instance: Evaluates to: However, sometimes the easiest is to start the string literal with apostrophes instead of quotes, if the string contains quotes. One can also use XML character references, like & or
, to express characters that cannot be directly represented in the encoding of the file containing the query. When curly braces should appear inside node constructors, one can again escape them with double braces or use character references. For instance: Integers, decimals, doubles and strings can be created by using literal expressions. Booleans with the functions true() or false() (just true or false would be name tests), and for the rest constructor functions must be used. Essentially each atomic type can construct a value from a string. While doing so it validates the input string to ensure it has a proper format and if not, it issues a dynamic error. These formats tend to be as one would guess it to be. For instance, if one passes "1.five" to xs:decimal's constructor, as opposed to "1.5" it will halt the query such that the bug can be corrected. In the example an xs:boolean was created from an xs:integer as opposed to from a string, and that's because values doesn't have to be constructed from strings, they can be created, or converted, from a range of different types. For instance, an xs:double can be created from a xs:decimal, or a xs:boolean can be converted to an xs:string. What conversions that are possible depends on the types but they tend to be intuitive. One of the specifications has a nifty table outlining those. Apart from the usual arithmetic operators between numbers one would expect, they are also available between more exotic types. Have a look at this query:snippets/patternist/literalsAndOperators.xq- It substracts two dates which returns an xs:dayTimeDuration, which it subsequently compares against another :xs:dayTimeDuration. The query finally evaluates to a single atomic value of type xs:boolean. The available operators and between what types are summarized in a table in the main XQuery specification. Another alternative is to ask a question or two on the mailing lists qt-interest or talk at x-query.com. FunctX is a collection of XQuery functions that can be both useful and educational. Of course, the specifications is one alternative, but one has to take a deep breath before diving into those. Here are the links to (some of) them: Another reason coulbe be that the context item is not what one expects it to be. For instance, this expression: won't match because the node the doc() function returns is not the top element node(html), it is the document node. This expression: wouldn't be sorted since the items the let clause binds aren't dealt with on an individual basis. One approach to this is to instead use the for loop, which doesn't perform node sorting on its result: Another way is to invoke its constructor function:Select based on Positions and Numeric Ranges
While a predicate is applied to its focus, the current contextposition can be obtained by using the function position(). For instance, this query: selects all the paragraphs except the five first. Filterting based on Logic
If a predicate doesn't evaluate to a number, it is considered a truth predicate. A truth predicate takes the value the predicate expression evaluates to and computes its effectivebooleanvalue. The rules for how that is done, is as follows:
For instance, this query: selects all paragraphs that has a table as a child, since the predicate evaluates to true if the contained step, table, matches any nodes. This is of course very different from: which returns the tables found inside paragraphs (which should be none, since they cannot appear there).Creating nodes
While the XQuery language has a lot of functions and expression for selecting and filtering exisitng content, it can also create new content using its node constructors. Consider: While this looks like an XML document, and in fact is so, it also is a valid XQuery query. Node constructors are by large just like XML, so if one knows XML, one can simply continue to write XML inside queries whenever one needs to have nodes created. There is however two things that set direct node constructs apart from XML: one can embed XQuery expressions inside of them, and they are expressions themselves. Let's first look at the former.Computing values inside nodes
Creating a value inside a node at runtime is done by embedding expressions inside curly braces. For instance, this expression, simple as it is, constructs an element with the text node "6" inside of it: Similarly, one can embed expressions inside attributes. For instance: creates an element whose attribute called class has the value "important example obsolete", without quotes.Node Constructors are Expressions
Because node constructors are expressions just like for instance function calls, paths and literals, they can appear anywhere where expressions can appear. For instance: If maybeNotWellformed.xml can be read successfully it creates a para element for each p element that appears anywhere in the document and copies p's child nodes into it. But if the document cannot be loaded, a single para element is created that contains a descriptive message. Computing Node Names at Runtime
Direct node constructors are fine, but what if one doesn't know the names of the nodes to construct when writing the query? For each direct element constructor, there exist a computed node constructor, that takes names and the node values as arbitrary expressions. For instance, the query seen above that produced a small XML document, can also be written like this:Copying nodes into other nodes
When an expression embedded inside a node expression evaluates to strings (or any other type of atomic values) the values becomes text nodes by concatenating them with a space inbetween. However, when the expression evaluates to nodes, they are copied and becomes children of the surrounding node. This can occasionally be deceptive. Consider this query: This won't output a p element that has the value of the version attribute, it will instead copy the attribute onto the p element whose result in not even valid XHTML. The approach is instead, in the case of wanting the value of the attribute instead of itself, to extract that using for instance the string() function:Escaping Characters
In the XQuery syntax, a set of characters are given special meaning. For instance, apostrophes or quotes start and terminate string literals. These can be escaped by writing the character twice: Dates, Times, Numbers and other Atomic Values
Apart from nodes, XQuery has atomic values and they are just what one would think they are: small, primitive values, that have a similar role to C++'s plain old data structures like float or long. In total there are about twenty of them, some of the most common being: xs:integer A 64 bit integer xs:boolean A boolean value, false or true xs:double A 64 bit floating point value xs:string A string where each codepoint is an XML 1.0 character(essentially Unicode) xs:date A date, such as when you're born: 1984-10-15 xs:time A time, such as when you show up at work: 09:00:00 xs:dateTime A date followed by a time: 1974-10-15T05:00:00 xs:duration A time interval such as P5Y2M10DT15H, which represents five years, two months, 10 days, and 15 hours. xs:base64Binary Represents data, possibly binary data, in Base 64 encoding. Creating Atomic Values
Apart from the builtin functions that returns atomic values, such as current-date-time(), constructor functions can be used to Using Atomic Values
Once atomic values have been constructed, via one of the methods mentioned above, or as return values from functions or by evaluating variables, one can pass them along to functions, convert them to strings and attach them as part of text nodes to nodes, or use operators between them. Let's look at the latter. Further Reading
XQuery is a big language that is hard to cover in an overview. If one wants a good understanding of the subject, a good thing could be to get a book on topic. FAQ
Path expressions misses
Often this is caused by that the names that the axis step matches, is different from nodes being matched. For instance, let's say that index.html in this query: is an XHTML document and hence it resides in the namespace http://www.w3.org/1999/xhtml/. The path won't match since they look for {}html and so forth, while the actual name is {http://www.w3.org/1999/xhtml/}html. The fix is straight forward: Path expressions also pick up the default namespace if one is declared: In this case the nodes created by the direct element constructors will be in the XHTML namespace, but so will the path expressions. Hence they look for {http://www.w3.org/1999/xhtml/}tests and so forth, while testResult.xml is perhaps in a different namespace, or no namespace at all. Variable in for loop is out of scope
Due to expression precedence it might be necessary to wrap the return expression in a for clause with paranteses: Without the paranteses on the last line, the arithmetic expression would have had the whole for clause as it left operand, and since the scope of variable $d ends at the return clause, the variable reference would be out of scope.Expressions aren't evaluated
If an expression is inside a node constructor it must be surrounded by curly braces, otherwise it's interpreted as text. This: evaluates to: while: evaluates to:Filters selects the wrong things
When having predicates, consider what the predicate applies to. For instance, this query: evaluates to the first span elements in each p element, while this query: evaluates to only one span element, the one that occured first in the result of the path expression as a whole. In the first case the filter expression was applied for the span step.FLWOR doesn't behave as expected
Note that a for expression generates a so called tuple stream, while a let clause is an ordinary variable binding. For instance, if a let binding is placed inside a for binding it is created for each tuple. The order by clause in turn applies to the result of the tuple stream that the for clause evaluates to. Consider: evaluates to 4 2 -2 2 -8 2. Nodes are created in the wrong order
If nodes are created in the wrong order, it can possibly be related to that the document order between nodes created with node constructors is undefined. For that reason node sorting, which is invoked by path expressions for instance, returns nodes in an order which is undefined. Hence, one gets nodes in an arbitrary order if node constructors are placed somewhere in a path expression; or indirectly, if nodes are created inside a user-declared function which is called from a path step. Consider: This query evaluates to a sequence of p elements. However, the order is not in the same order as the item elements appear in feed.rss. The order is, counter intuitive as it may seem, undefined. true or false doesn't work
Boolean values, that is atomic values of type xs:boolean, cannot be created by writing true or false inside the query, since those are steps, name tests to be precise. The safest and easiest way to create boolean values is to use the builtin functions false() or true().
Copyright © 2008 Trolltech
Trademarks