W3C

Builder Module 1.0

EXPath Module 27 October 2020

This version:
http://expath.org/spec/builder/1.0
Latest version:
http://expath.org/spec/builder
Editor:
Michael Kay <mike@saxonica.com>

This document is also available in these non-normative formats: XML and Revision Markup.


Abstract

This proposal provides an API for XPath 3.1 to construct XDM node trees.

The module homepage, with more information, is on the EXPath website at http://expath.org/modules/binary/.

Table of Contents

1 Status of this document
2 Introduction
    2.1 Namespace conventions
    2.2 Error management
    2.3 Relationship to XSLT and XQuery
    2.4 Node names
    2.5 Node identity
    2.6 Schema awareness
    2.7 Test suite
3 Functions
    3.1 build:attribute
    3.2 build:comment
    3.3 build:document
    3.4 build:element
    3.5 build:namespace
    3.6 build:processing-instruction
    3.7 build:text

Appendices

A References
B Summary of error conditions


1 Status of this document

This document is a first draft specification for community review.

2 Introduction

2.1 Namespace conventions

The module defined by this document defines several functions, all contained in the namespace http://expath.org/ns/builder. In this document, the build prefix, when used, is bound to this namespace URI.

Error codes are defined in the same namespace (http://expath.org/ns/build), and in this document are displayed with the same prefix, build.

2.2 Error management

TODO: should we use XSLT error codes, XQuery error codes, or our own error codes?

2.3 Relationship to XSLT and XQuery

Both XSLT and XQuery provide native syntax for constructing XDM node trees.

There are a number of reasons for providing a function library to achieve the same thing:

  • It provides a mechanism that is usable in free-standing XPath expressions, when XPath is not used within XSLT or XQuery.
  • It provides a way of creating nodes that is portable between XQuery and XSLT, which is useful for people developing code for both environments.
  • It provides the same functionality in the form of function calls, which may sometimes be more convenient than custom syntax. This is particularly true in XSLT, where XSLT instructions cannot be directly invoked from within an XPath expression.
  • It is particularly useful with higher-order functions: for example, given a function replace-with(string, regex, function) that applies the supplied function to the matching substrings, the call replace-with($in, "\[.*?\]", build:element("cite", ?)) turns See [Kay, 1970] into See <cite>Kay, 1970</cite>.
  • The use of functions with an options parameter allows for setting of properties that are not easy to control using XSLT and XQuery language facilities, for example, the base URI of a document node.

The semantics of operations for node construction in XSLT and XQuery are very similar. There are however a few minor differences, and in these cases the XSLT rules have been chosen in preference. The differences include:

  • In the content sequence of an element, XSLT allows several attributes to have the same name, and uses the last of the duplicates; XQuery reports an error.
  • If comments and processing instructions contain invalid content (such as "--" in a comment), XSLT repairs the value while XQuery reports an error.

XSLT and XQuery define different error codes for the same conditions. This specification currently uses the XSLT codes, but will change to use its own error codes.

2.4 Node names

In function arguments where a node name is required, the required type is denoted as union(xs:string, xs:QName). Such a type can be defined in a schema, but in the absence of a schema, it cannot be directly expressed as a SequenceType in XPath 3.1 syntax. Implementations may therefore use the nearest supertype, namely xs:anyAtomicType, with a dynamic check that the supplied value conforms to the required constraints.

2.5 Node identity

The functions are defined to be non-deterministic with respect to node identity (as defined in F+O §1.7.4). This means that it is implementation-dependent whether two calls with the same arguments return the same node or different nodes. This approach gives implementations maximum freedom for optimization, for example it allows function calls to be extracted from a loop.

2.6 Schema awareness

Options are available to request validation of constructed nodes against a schema, in the same way as for instructions such as xsl:element in XSLT. The semantics of the various operations are outlined briefly in this specification, but the intent is that the rules should be the same as XSLT (which defines them in much more detail) unless otherwise specified.

Processors that do not support schema-awareness should raise an error if options dependent on a schema are selected.

2.7 Test suite

A suite of test-cases for all the functions defined in this module, in [QT3] format, will be created.

3 Functions

This EXPath module defines the following functions:

3.1 build:attribute

Summary

Returns an attribute node, with given name and content.

Signatures

build:attribute($name as union(xs:QName, xs:string),
$content as xs:string?) as attribute()
build:attribute($name as union(xs:QName, xs:string),
$content as xs:string?,
$options as map(*)) as attribute()

Rules

The effect of the two-argument form of this function is the same as calling the three-argument form with an empty map as the value of the $options argument.

The name of the attribute may be supplied in a number of ways:

  • The first argument may be an xs:QName, in which case the attribute's name will have the prefix, namespace URI, and local name supplied in this QName.
  • The first argument may be an xs:string that conforms to the rules for a valid xs:NCName. The attribute's name will have this local part, with no namespace URI or prefix.
  • The first argument may be an xs:string in the format Q{uri}local. The attribute's local name and namespace URI will be taken from this value, and will have a system-allocated prefix.
  • The first argument may be an xs:string in the format prefix:local. In this case the the prefix must be declared in the static context of the function call, and the element's name will use this prefix and local name, together with the namespace URI associated with this prefix in the static context.

If the attribute name has a URI but no prefix, then the system will allocate an arbitrary prefix. If the attribute name is given as an xs:QName with a prefix but no URI, then the prefix will be ignored.

The content of the attribute node (that is, the string value of the node) is formed by evaluating the second argument. If this is an empty sequence, the string value of the attribute will be a zero-length string.

The type annotation of the new attribute node will be xs:untypedAtomic.

If the attribute has the name xml:id then xml:id processing is performed, and the attribute will have the is-id property.

The function imposes rules preventing the misuse of reserved names such as "xml" and "xmlns", in the same way as the xsl:attribute instruction in XSLT, or the attribute constructor expression in XQuery. The error codes used are those defined in XSLT 3.0.

The entries that may appear in the $options map are as follows. The option parameter conventions apply.

KeyMeaning
typeCauses the value to be validated against a named simple type. The value must be the name of a type in the in-scope schema definitions. The supplied string is validated against this type and a dynamic error occurs if is not valid. The returned attribute node has this type annotation. Validation may also affect the string value of the attribute, for example by collapsing whitespace.
  • Type: xs:QName

  • Default: absent

If the function is called twice with the same arguments, it is unpredictable whether it returns the same attribute node or different attribute nodes from the two invocations.

Error Conditions
Notes

The XSLT/XQuery rules for constructing simple content do not apply. The value must be supplied as a string, or as a value that is converted to a string by virtue of the function conversion rules.

  1. These rules are similar to the XQuery rules for the element {...} expression. However, there are some differences. Most notably, the XSLT rules allow multiple attribute nodes with the same name to appear in the content sequence (the last one wins). Furthermore, the error codes used for invalid conditions (such as the presence of maps or functions or conflicting namespace nodes in the content) are those given in the XSLT 3.0 specification.

  2. The XSLT/XQuery rules for constructing simple content do not apply. The value must be supplied as a string, or as a value that is converted to a string by virtue of the function conversion rules.

  3. Since the declared type of the first argument is namespace sensitive, error XPTY0117 will be raised if an untyped atomic value (or an untyped node) is supplied as the actual argument. Conversion to a string should therefore be done explicitly. For example, to convert the element <prop name="x" value="y"/> to the attribute node x="y", use build:attribute(string(@name), string(@value))

3.2 build:comment

Summary

Returns a comment node, with given content.

Signature

build:comment($content as xs:string?) as text()

Rules

The content of the comment (that is, the string value of the node) is formed by evaluating the first argument. If this is an empty sequence, the string value will be a zero-length string.

If the function is called twice with the same arguments, it is unpredictable whether it returns the same node or different nodes from the two invocations.

If the content contains the substring "--", or if it ends in "-", this is handled in the same way as the xsl:comment instruction in XSLT: the value is adjusted by inserting spaces.

Notes

The XSLT/XQuery rules for constructing simple content do not apply. The value must be supplied as a string, or as a value that is converted to a string by virtue of the function conversion rules.

3.3 build:document

Summary

Returns a document node, with given content.

Signatures

build:document($content as item()*) as document-node()
build:document($content as item()*, $options as map(*)) as document-node()

Rules

The effect of calling the single-argument function is the same as the effect of calling the two-argument function supplying an empty map as the second argument.

The content of the document node (that is, the children of the node) is formed by evaluating the first argument, and applying the rules given in the XSLT 3.0 specification section 5.7.1, Constructing Complex Content.

The base URI of the new document node is taken from the static base URI of the calling expression.

If the function is called twice with the same arguments, it is unpredictable whether it returns the same document node or different document nodes from the two invocations.

It is not required that the resulting document should satisfy the XML rules for a well-formed document; specifically, the node may contain multiple element and text nodes among its children.

The entries that may appear in the $options map are as follows. The option parameter conventions apply.

KeyValueMeaning
base-uriDetermines base URI of the returned element node. This should be an absolute URI.
  • Type: xs:string

  • Default: The static base URI of the function call

typeCauses the content of the outermost element to be validated against a named schema type. The validation and type options are mutually exclusive. The value must be the name of a type in the in-scope schema definitions. The supplied content is validated against this type and a dynamic error occurs if is not valid.
  • Type: xs:QName

  • Default: absent

validationCauses the content of the outermost element to be validated against a schema. The validation and type options are mutually exclusive.
  • Type: xs:string

  • Default: skip

strictThere must be an element declaration with matching name in the in-scope schema definitions. The element is validated against this declaration.
laxThere may be an element declaration with matching name in the in-scope schema definitions: if there is, then the element is validated against this declaration.
skipThe content is not validated.
Notes

These rules are almost identical to the XQuery rules for the document {...} expression. However, the error codes used for invalid conditions (such as the presence of attributes, namespace nodes, maps or functions in the content) are those given in the XSLT 3.0 specification.

3.4 build:element

Summary

Returns an element node, with given name and content.

Signatures

build:element($name as union(xs:QName, xs:string),
$content as item()*) as element(*)
build:element($name as union(xs:QName, xs:string),
$content as item()*,
$options as map(*)) as element(*)

Rules

The effect of the two-argument form of this function is the same as calling the three-argument form with an empty map as the value of the $options argument.

The name of the element node is determined by the first argument. This may be supplied either as an instance of either xs:string or xs:QName.

The name of the element may be supplied in a number of ways:

  • The first argument may be an xs:QName, in which case the element's name will have the prefix, namespace URI, and local name supplied in this QName.
  • The first argument may be an xs:string, that conforms to the rules for a valid xs:NCName. The element's name will have this local part, with no namespace URI or prefix.
  • The first argument may be an xs:string in the format Q{uri}local. The element's local name and namespace URI will be taken from this value, and the name will have no prefix (that is, the URI will be the default namespace).
  • The first argument may be an xs:string in the format prefix:local. In this case the the prefix must be declared in the static context of the function call, and the element's name will use this prefix and local name, together with the namespace URI associated with this prefix in the static context.

The content of the element node (that is, the children of the node) is formed by evaluating the second argument, and applying the rules given in the XSLT 3.0 specification section 5.7.1, Constructing Complex Content.

The base URI of the new element node is taken from the static base URI of the calling expression.

The type annotation of the new element node will be xs:untyped.

Namespace fixup is applied to the new element as described in the XSLT 3.0 specification to ensure that all namespaces used in element and attribute names are properly declared.

The entries that may appear in the $options map are as follows. The option parameter conventions apply.

KeyValueMeaning
base-uriDetermines base URI of the returned element node. This should be an absolute URI.
  • Type: xs:string

  • Default: The static base URI of the function call

is-idDetermines whether the element has the is-id property.
  • Type: xs:boolean

  • Default: false

is-idrefsDetermines whether the element has the is-idrefs property.
  • Type: xs:boolean

  • Default: false

inherit-namespacesDetermines whether the namespaces of the newly constructed element are propagated to the copies of its descendants. The semantics correspond to the inherit-namespaces attribute of the xsl:element instruction
  • Type: xs:boolean

  • Default: true

typeCauses the content to be validated against a named schema type. The validation and type options are mutually exclusive. The value must be the name of a type in the in-scope schema definitions. The supplied content is validated against this type and a dynamic error occurs if is not valid. The returned element node has this type annotation.
  • Type: xs:QName

  • Default: absent

validationCauses the content to be validated against a schema. The validation and type options are mutually exclusive.
  • Type: xs:string

  • Default: skip

strictThere must be an element declaration with matching name in the in-scope schema definitions. The element is validated against this declaration.
laxThere may be an element declaration with matching name in the in-scope schema definitions: if there is, then the element is validated against this declaration.
preserveThe content of the new element is not validated, but any descendant nodes that are copied retain their type annotations.
stripThe content is not validated, and any descendant nodes that are copied have their type annotation changed to xs:untyped.

If the function is called twice with the same arguments, it is unpredictable whether it returns the same element node or different element nodes from the two invocations.

Notes
  1. These rules are similar to the XQuery rules for the element {...} expression. However, there are some differences. Most notably, the XSLT rules allow multiple attribute nodes with the same name to appear in the content sequence (the last one wins). Furthermore, the error codes used for invalid conditions (such as the presence of maps or functions or conflicting namespace nodes in the content) are those given in the XSLT 3.0 specification.

  2. Any attribute nodes in the content sequence become attributes of the constructed element; they are not atomized to form text nodes.

  3. Since the declared type of the first argument is namespace sensitive, error XPTY0117 will be raised if an untyped atomic value (or an untyped node) is supplied as the actual argument. Conversion to a string should therefore be done explicitly. For example, to convert <prop name="x" value="y"/> to <x>y</x>, use saxon:new-element(string(@name), string(@value))

  4. Supplying a simple NCName as the first argument means the element will be in no namespace. The default namespace for elements is NOT used.

3.5 build:namespace

Summary

Creates a namespace node.

Signature

build:namespace($prefix as xs:string, $uri as xs:string) as namespace-node()

Rules

This function creates a new parentless namespace node. The first argument gives the name of the namespace node (that is, the namespace prefix), while the second gives the namespace URI. The prefix may be "" to create a default namespace; otherwise it must be a valid NCName. The URI must not be the empty string.

3.6 build:processing-instruction

Summary

Returns a new processing instruction node, with given name and content.

Signature

build:processing-instruction($name as xs:string,
$content as xs:string?) as processing-instruction()

Rules

This function constructs a new parentless processing instruction node.

The name of the processing instruction is determined by the first argument. This must be an instance of xs:string that conforms to the rules for an xs:NCName; it must not match the name "xml" in a case-blind comparison.

The content of the processing instruction (that is, the string value of the node) is formed by evaluating the second argument. If this is an empty sequence, the string value will be a zero-length string.

Any substring of the string value that matches ?> is replaced by ? > (that is, a space is inserted).

If the function is called twice with the same arguments, it is unpredictable whether it returns the same node or different nodes from the two invocations.

Notes

The XSLT/XQuery rules for constructing simple content do not apply. The value must be supplied as a string, or as a value that is converted to a string by virtue of the function conversion rules.

3.7 build:text

Summary

Returns a new text node, with given content.

Signature

build:text($content as xs:string?) as text()

Rules

This function constructs a new parentless text node.

The content of the text node (that is, the string value of the node) is formed by evaluating the second argument. If this is an empty sequence, the string value will be a zero-length string.

If the function is called twice with the same arguments, it is unpredictable whether it returns the same node or different nodes from the two invocations.

Notes

The XSLT/XQuery rules for constructing simple content do not apply. The value must be supplied as a string, or as a value that is converted to a string by virtue of the function conversion rules.