Package net.sf.saxon.s9api.streams

This package (introduced in Saxon 9.9) provides methods to manipulate XDM values using Java 8 streams.

A value in the XDM data model is represented by an instance of XdmValue. This is in general a sequence; the members of the sequence are instances of XdmItem. XdmItem is an abstract class; its concrete subclasses include XdmNode for nodes, XdmAtomicValue for atomic values, XdmFunctionItem for function items, XdmMap for maps, and XdmArray for arrays.

Given an XdmNode N, it is possible to select other nodes using an expression such as N.select(child("author")).asNode(). The way this works is as follows:

  • The select() method (which applies to any XdmValue, not only an XdmNode) takes as its argument a Step, and returns as its result an XdmStream. A Step is a function that takes an XdmItem as input, and returns a stream of items as its result; the select() method combines these streams into a single stream (in much the same way as the Java 8 flatMap() operation) and returns the result.
  • child("author") invokes a static method in class Steps, which delivers a Step whose effect is to find the children of a supplied node that have local name "author", returning these as a Stream. The Steps class provides a large collection of useful implementations of Step, including support for all the XPath axes (parent, child, descendant, following-sibling and so on).
  • The class XdmStream (which will typically not be used explicitly) implements the standard Java 8 java.util.streams.Stream class; it does so by wrapping a standard Java 8 stream and delegating all standard methods to the wrapped stream. By subclassing the standard Stream interface, however, it is able to supply additional methods appropriate to streams of XDM items. For example, the asNode() example terminates the Stream pipeline by converting the result of the stream to a single XdmNode value (an unchecked exception occurs if the content of the stream is anything other than a single node).

The power of the approach rests in the range of Step implementations available, and the way these can be combined; and in the terminal operations to deliver the result of a stream in a useful way. Although many standard implementations are available, these can be augmented with user-written methods; since a Step is just a Java 8 Function, and an XdmStream is just a Java 8 Stream, all the standard Java 8 facilities for creating and combining functions and streams are available.

Steps

The steps that are available "off the shelf" from the Steps class include the following:

  • For each of the 13 XPath axes, taking child as an example, four Step implementations are provided:

    • child() selects all the children of a node.
    • child("lname") selects all the children of a node that have the local name "lname".
    • child("ns", "lname") selects all the children of a node that have the namespace URI "ns" and local name "lname".
    • child(predicate) selects all the children of a node that satisfy the given predicate. The predicate may be any Java 8 java.util.functions.Predicate, but the class Predicates provides some off-the-shelf predicates that are particularly designed for navigating XDM information.
  • Two steps S and T may be combined into a single step using the then method: for example child("X").then(attribute("A")).

  • More generally, any sequence of steps may be combined using the path() function, for example path(child("table"), child("thead"), child("trow"), attribute("class"))

  • Where the steps in a path are the most commonly used variety, a path may be written using an abbreviated notation: path("table", "thead", "trow", "@class") is equivalent to the previous example.

  • The results of two steps may also be concatenated: for example child("author").cat(child("editor")) concatenates the results of the two steps into a single stream.

  • The atomize() step reduces nodes (and arrays) to atomic values, as defined in the XPath specification.

  • If S is a Step, and P is a Predicate, then S.where(P) is also a Step, which selects those elements selected by S that satisfy P. For example child().where(isElement()) selects those children of the starting node that are elements. (But this particular example can be written more easily as child(isElement())).

Predicates

Predicates can be used either in the where() method to construct a new Step, or they may be used in the standard Java 8 filter() method to filter the items in a stream. Any predicate may be used in either context. The utility class Predicates provides a range of predicates that are particularly suited to XDM navigation.

These predicates include:

  • Predicates isNode(), isAtomicValue(), isMap(), isArray() etc to test whether an XdmItem is a particular kind of item;
  • Predicates isElement(), isAttribute(), isText() etc to test the kind of a node;
  • The predicate hasType() to test whether an item matches a specific item type: for example hasType(ItemType.XS_DATE_TIME) tests whether it is an instance of xs:dateTime;
  • The predicate eq("string") tests whether the string value of the item is equal to "string"
  • The predicate eq(XdmAtomicValue) performs a typed comparison, for example comparing two values as numbers
  • The predicate matchesRegex("regex") tests whether the string value of the item matches the regular expression "regex"
  • If S is a Step and P is a Predicate, then some(S, P) is a predicate that returns true if some item returned by S satisfies P; similarly every(S, P) tests if all items returned by S satisfy P. For example some(attribute("id"), eq("A123")) is true for an element that has an id attribute equal to "A123". This particular condition can also be expressed more concisely as eq(attribute("id"), "A123")

Operations on XDM streams

An XdmStream (as delivered by XdmValue.select(step)) is an implementation of a Java 8 Stream, so all the standard methods on Stream are available: for example filter, map(), flatMap, reduce, collect. Where appropriate, these are specialized to return an XdmStream rather than a generic Stream.

XdmStream provides some additional terminal operations designed to make it convenient to convert the contents of the stream into usable form. These include:

  • first() - deliver the first item of the stream
  • last() - deliver the last item of the stream
  • at() - deliver the item at position N the stream
  • exists() - return true if the stream is non-empty
  • subStream() - deliver a stream containing a subsequence of the input stream
  • asXdmValue() - deliver the contents as an XdmValue
  • asList() - deliver the contents as a List<XdmItem>
  • asListOfNodes() - deliver the contents as a List<XdmNode>
  • asOptionalNode() - deliver the contents as an Optional<XdmNode>
  • asNode() - deliver the contents as a single XdmNode
  • asListOfAtomic() - deliver the contents as a List<XdmAtomicValue>
  • asOptionalAtomic() - deliver the contents as an Optional<XdmAtomicValue>
  • asAtomic() - deliver the contents as a single XdmAtomicValue
  • asOptionalString() - deliver the contents as an Optional<String> by taking the string value of each item
  • asString() - deliver the contents as a single String

The choice of terminal operation determines the return type (for example asOptionalNode() returns Optional<XdmNode>), and also causes a run-time check that the value actually conforms to these expectations. For example, if asNode() is used, then an unchecked exception occurs if the sequence has a length other than 1 (one), or if its single item is not a node.

Other ways of generating an XdmStream include:

  • As the result of an XPath expression, using the method XPathSelector.stream()
  • As the result of an XQuery expression, using the method XQueryEvaluator.stream()
  • As the result of XdmSequenceIterator.stream()