The saxon:preprocess facet

Saxon provides the saxon:preprocess facet as an addition to the standard facets defined in the XSD 1.1 specification. It is available only when XSD 1.1 support is enabled.

Like xs:whiteSpace, this is a pre-lexical facet. It is used to transform the supplied lexical value of an element or attribute from the form as written (but after whitespace normalization) to the lexical space of the base type. Constraining facets such as pattern, enumeration, and minLength apply to the value after the saxon:preprocess facet has done its work. In addition, if the primitive type is say xs:date or xs:decimal, the built-in lexical rules for parsing a date or a decimal number are applied only after saxon:preprocess has transformed the value. This makes it possible, for example, to accept yes and no as values of an xs:boolean, 3,14159 as the value of an xs:decimal, or 13DEC1987 as the value of an xs:date.

Like other facets, saxon:preprocess may be used as a child of xs:restriction when restricting a simple type, or a complex type with simple content.

The attributes are:

Attribute

Usage

id

Standard attribute.

action

Mandatory. An XPath expression. The rules for writing the XPath expression are generally the same as the rules for the test expression of xs:assert. The value to be transformed is supplied (as a string) as the value of the variable $value; the context item is undefined. The expression must return a single string. If evaluation of the expression fails with a dynamic error, this is interpreted as a validation failure.

reverse

Optional. An XPath expression used to reverse the transformation. Used (in XPath, XSLT, and XQuery) when a value of this type is converted to a string. When a value of this type is converted to a string, it is first converted according to the rules of the base type. The resulting string is then passed, as the value of variable $value, to the XPath expression, and the result of the XPath expression is used as the final output. This attribute does not affect the schema validation process itself.

xpathDefaultNamespace

The default namespace for element names (unlikely to appear in practice) and types.

The following example converts a string to upper-case before testing it against the enumeration facet.

<xs:simpleType name="currency"> <xs:restriction base="xs:string"> <saxon:preprocess action="upper-case($value)" xmlns:saxon="http://saxon.sf.net/"/> <xs:enumeration value="USD"/> <xs:enumeration value="EUR"/> <xs:enumeration value="GBP"/> </xs:restriction> </xs:simpleType>

Of course, it is not only the constraining facets that will see the preprocessed value (in this case, the upper-case value), any XPath operation that makes use of the typed value of an element or attribute node will also see the value after preprocessing. However, the string value of the node is unchanged.

The following example converts any commas appearing in the input to full stops, allowing decimal numbers to be represented in Continental European style as 3,15. On output, the process is reversed, so that full stops are replaced by commas. (Note that in this example, the user-defined type also accepts numbers written in the "standard" style 3.15.)

<xs:simpleType name="euroDecimal"> <xs:restriction base="xs:decimal"> <saxon:preprocess action="translate($value, ',', '.')" reverse="translate($value, '.', ',')" xmlns:saxon="http://saxon.sf.net/"/> </xs:restriction> </xs:simpleType>

The following example allows an xs:time value to be written with the seconds part omitted. Again, it also accepts the standard hh:mm:ss notation:

<xs:simpleType name="hoursAndMinutes"> <xs:restriction base="xs:time"> <saxon:preprocess action="concat($value, ':00'[string-length($value) = 5])" xmlns:saxon="http://saxon.sf.net/"/> </xs:restriction> </xs:simpleType>

The following example uses extension function calls within the XPath expression to support integers written in hexadecimal notation:

<xs:simpleType name="hexInteger"> <xs:restriction base="xs:long"> <saxon:preprocess action="Long:parseLong($value, 16)" reverse="Long:toHexString(xs:long($value))" xmlns:Long="java:java.lang.Long" xmlns:saxon="http://saxon.sf.net/"/> </xs:restriction> </xs:simpleType>

Given the input <val>0040</val>, validated against this schema, the query (val*3) cast as hexInteger will produce the output c0.

If the xs:restriction element defines facets other than saxon:preprocess, for example xs:enumeration or xs:minInclusive, then the values supplied in these other facets are validated against the rules for the base type: that is, they are not subject to preprocessing. So a facet that defines US date formats earlier than a certain date might look like this:

<xs:simpleType name="us-date-before-2012"> <xs:restriction base="xs:date"> <saxon:preprocess action="concat(substring($value, 7, 4), '-', substring($value, 1, 2), '-', substring($value, 4, 2))" xmlns:saxon="http://saxon.sf.net/"/> <xs:maxInclusive value="2011-12-31"/> </xs:restriction> </xs:simpleType>

However, if the type is further restricted, then facets for derived types will be validated after preprocessing. So an alternative formulation of the above type would be:

<xs:simpleType name="us-date"> <xs:restriction base="xs:date"> <saxon:preprocess action="concat(substring($value, 7, 4), '-', substring($value, 1, 2), '-', substring($value, 4, 2))" xmlns:saxon="http://saxon.sf.net/"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="us-date-before-2012"> <xs:restriction base="us-date"> <xs:maxInclusive value="12-31-2011"/> </xs:restriction> </xs:simpleType>

The preprocess facet is not currently implemented for list or union types.