The select() API
In Java and C#, as an alternative to use of XPath, the
select() (or Select()) method on
XdmValue provides powerful navigation capability around XDM
trees. This method takes as argument a
Step
(Step in C#) (which is a function from one node
to related nodes) and returns a sequence of nodes as a result. There is a
library (Steps in
Java, and Steps in C#) of
Step functions corresponding to all the XPath axes, and the
results of applying a Step can be filtered using predicates:
there is also a library
(Predicates in Java,
and Predicates in C#) of
commonly-used Predicates.
The select API is a Java API designed to perform navigation over, and selection of, nodes and values in XML trees. It provides functionality roughly equivalent to the standard XPath navigation and selection operations, but is much more convenient than constructing, evaluating, and interpreting the results of an XPath expression.
Suppose, for example, that you have a book where individual chapters have authors where each chapter may be in either a draft or final status. Something like this:
<book> ... <chapter status="draft"> <author>John Smith</author> <title>...</title> ... </chapter> ... </book>An XPath expression like this one would select all of the authors we’re interested in:
//chapter[@status='draft']/author/text()You could evaluate that XPath expression using Saxon using code along these lines:
// processor = … make a Processor // document = … load the book ArrayList<String> authors = new ArrayList<>(); try { XPathCompiler compiler = processor.newXPathCompiler(); XPathExecutable exec = compiler.compile("//chapter[@status='draft']/author/text()"); XPathSelector selector = exec.load(); selector.setContextItem(document); for (XdmValue author : selector.evaluate()) { authors.add(author.toString()); } catch (SaxonApiExcption ex) { // do something } }Alternatively, you could use the select API:
// processor = … make a Processor // document = … load the book ArrayList<String> authors = new ArrayList<>(); document.select(descendant("chapter") .where(attributeEq("status", "draft")) .then(child("author")) .then(text())) .asList().forEach(item -> authors.add(item.toString())); }The select() API provides three different ways to match nodes by name: with a
simple string, with a pair of strings representing the namespace and local name,
or with a QName. Unlike XPath, in the select() API, the simple string form
matches names in any namespace.
Here are a few examples:
child("*") |
Matches any elements in any namespace, including none |
child("div") |
Matches div elements in any namespace, including none |
child("http://www.w3.org/1999/xhtml", "div") |
Matches div elements in the HTML namespace |
child(QName("div")) |
Matches div elements in no namespace |
Steps
The select API is built on top of the host language’s underlying stream interface.
The select() (or Select()) in C#) method on an
XdmValue returns an XdmStream which wraps a Java or C#
Stream. A selection takes
as input a Step. There are many possible steps, most correspond to XPath axes:
| The method: | Returns a step that selects: |
|---|---|
| root() | The root of the tree, if started on a node |
| ancestor(*) | The ancestors |
| ancestorOrSelf(*) | The ancestors and self |
| attribute(*) | The attributes |
| child(*) | The children |
| descendant(*) | The descendants |
| descendantOrSelf(*) | The descendants and self |
| following(*) | The following nodes |
| followingSibling(*) | The following siblings |
| followingOrSelf(*) | The following nodes and self |
| followingSiblingOrSelf(*) | The following siblings and self |
| namespace() | The namespaces |
| namespace(String localName) | The namespaces with the prefix localName |
| parent(*) | The parent |
| precedingSibling(*) | The preceding siblings |
| preceding(*) | The preceding nodes |
| precedingOrSelf(*) | The preceding nodes or self |
| precedingSiblingOrSelf(*) | The preceding siblings or self |
| self(*) | Self |
| text() | Text nodes |
| nothing() | Nothing |
The steps marked “(*)” take optional arguments:
- A string, that matches the local name in any namespace
- A namespace URI and local name, as strings, that match just that name
- A QName, that matches just that name
- A Predicate (see below)
There are also a few additional steps for convenience.
atomize()Returns the original value atomized.
castAs(type)Returns the original value cast as the specified type.
path(steps)Joins a sequence of steps together into a compound path.
path(strings)~ Joins a sequence of strings (simple XPath fragments) intocompound path. The strings may be “/”, “//”, “..”, “*”, “name” or “@name”.
tokenize()Returns the original value tokenized on whitespace boundaries.
id(doc)Returns the node in ~doc~ that has an ID value equal to the original value.
All steps have several methods for selecting and combining items from the list.
| The method: | Has the effect of: |
|---|---|
| where(predicate) | Selecting items that match the predicate |
| cat(otherStep) | Concatenates the stream so far with the stream from otherStep |
| first() | Returns a stream that only contains the first item |
| last() | Returns a stream that only contains the last item |
| at(position) | Returns a stream that only contains the item at the position |
| then(next) | Continues navigation from the current point |
Predicates
Predicates allow you to select items from the stream.
| The predicate: | Selects that: |
|---|---|
isNode() |
are nodes |
isElement() |
are elements |
isAttribute() |
are attributes |
isText() |
are text nodes |
isComment() |
are comments |
isProcessingInstruction() |
are processing instructions |
isDocument() |
are documents |
isNamespace() |
are namespaces |
isAtomic() |
are atomic values |
isFunction() |
are functions |
isMap() |
are maps |
isArray() |
are arrays |
empty(step) |
where step is empty |
not(predicate) |
where predicate is false |
exists(step) |
where step exists |
hasName(name) |
where the item has the specified name |
hasLocalName(name) |
where the item has the specified local name |
hasNamespace(uri) |
where the item has the specified namespace |
hasAttribute(name) |
where the item has an attribute with the specified name |
attributeEq(name, value) |
where the item has an attribute with the specified name and value |
hasType(type) |
where the item has the specified type |
some(step, predicate) |
where at least one of the steps satisfies the predicate |
every(step, predicate) |
where every step satisfies the predicate |
eq(value) |
where the item equal value |
matchesRegex(regex) |
where the item matches the regex |
eq(step, value) |
where the step has the specified value |
The XdmStream object also provides appropriate implementations for the
methods defined on the underlying Stream object: filter, map,
distinct, etc.