The select() API

In Java and C#, as an alternative to use of XPath, the select() (or Select()) method on XdmValue provides powerful navigation capability around XDM trees. This method takes as argument a Step (Step in C#) (which is a function from one node to related nodes) and returns a sequence of nodes as a result. There is a library (Steps in Java, and Steps in C#) of Step functions corresponding to all the XPath axes, and the results of applying a Step can be filtered using predicates: there is also a library (Predicates in Java, and Predicates in C#) of commonly-used Predicates.

The select API is a Java API designed to perform navigation over, and selection of, nodes and values in XML trees. It provides functionality roughly equivalent to the standard XPath navigation and selection operations, but is much more convenient than constructing, evaluating, and interpreting the results of an XPath expression.

Suppose, for example, that you have a book where individual chapters have authors where each chapter may be in either a draft or final status. Something like this:

<book> ... <chapter status="draft"> <author>John Smith</author> <title>...</title> ... </chapter> ... </book>

An XPath expression like this one would select all of the authors we’re interested in:

//chapter[@status='draft']/author/text()

You could evaluate that XPath expression using Saxon using code along these lines:

// processor = … make a Processor // document = … load the book ArrayList<String> authors = new ArrayList<>(); try { XPathCompiler compiler = processor.newXPathCompiler(); XPathExecutable exec = compiler.compile("//chapter[@status='draft']/author/text()"); XPathSelector selector = exec.load(); selector.setContextItem(document); for (XdmValue author : selector.evaluate()) { authors.add(author.toString()); } catch (SaxonApiExcption ex) { // do something } }

Alternatively, you could use the select API:

// processor = … make a Processor // document = … load the book ArrayList<String> authors = new ArrayList<>(); document.select(descendant("chapter") .where(attributeEq("status", "draft")) .then(child("author")) .then(text())) .asList().forEach(item -> authors.add(item.toString())); }

The select() API provides three different ways to match nodes by name: with a simple string, with a pair of strings representing the namespace and local name, or with a QName. Unlike XPath, in the select() API, the simple string form matches names in any namespace.

Here are a few examples:

child("*") Matches any elements in any namespace, including none
child("div") Matches div elements in any namespace, including none
child("http://www.w3.org/1999/xhtml", "div") Matches div elements in the HTML namespace
child(QName("div")) Matches div elements in no namespace

Steps

The select API is built on top of the host language’s underlying stream interface. The select() (or Select()) in C#) method on an XdmValue returns an XdmStream which wraps a Java or C# Stream. A selection takes as input a Step. There are many possible steps, most correspond to XPath axes:

The method: Returns a step that selects:
root() The root of the tree, if started on a node
ancestor(*) The ancestors
ancestorOrSelf(*) The ancestors and self
attribute(*) The attributes
child(*) The children
descendant(*) The descendants
descendantOrSelf(*) The descendants and self
following(*) The following nodes
followingSibling(*) The following siblings
followingOrSelf(*) The following nodes and self
followingSiblingOrSelf(*) The following siblings and self
namespace() The namespaces
namespace(String localName) The namespaces with the prefix localName
parent(*) The parent
precedingSibling(*) The preceding siblings
preceding(*) The preceding nodes
precedingOrSelf(*) The preceding nodes or self
precedingSiblingOrSelf(*) The preceding siblings or self
self(*) Self
text() Text nodes
nothing() Nothing

The steps marked “(*)” take optional arguments:

  1. A string, that matches the local name in any namespace
  2. A namespace URI and local name, as strings, that match just that name
  3. A QName, that matches just that name
  4. A Predicate (see below)

There are also a few additional steps for convenience.

All steps have several methods for selecting and combining items from the list.

The method: Has the effect of:
where(predicate) Selecting items that match the predicate
cat(otherStep) Concatenates the stream so far with the stream from otherStep
first() Returns a stream that only contains the first item
last() Returns a stream that only contains the last item
at(position) Returns a stream that only contains the item at the position
then(next) Continues navigation from the current point

Predicates

Predicates allow you to select items from the stream.

The predicate: Selects that:
isNode() are nodes
isElement() are elements
isAttribute() are attributes
isText() are text nodes
isComment() are comments
isProcessingInstruction() are processing instructions
isDocument() are documents
isNamespace() are namespaces
isAtomic() are atomic values
isFunction() are functions
isMap() are maps
isArray() are arrays
empty(step) where step is empty
not(predicate) where predicate is false
exists(step) where step exists
hasName(name) where the item has the specified name
hasLocalName(name) where the item has the specified local name
hasNamespace(uri) where the item has the specified namespace
hasAttribute(name) where the item has an attribute with the specified name
attributeEq(name, value) where the item has an attribute with the specified name and value
hasType(type) where the item has the specified type
some(step, predicate) where at least one of the steps satisfies the predicate
every(step, predicate) where every step satisfies the predicate
eq(value) where the item equal value
matchesRegex(regex) where the item matches the regex
eq(step, value) where the step has the specified value

The XdmStream object also provides appropriate implementations for the methods defined on the underlying Stream object: filter, map, distinct, etc.