XML Schema 1.0 implementation

Saxon now implements enumeration facets on union and list types as the authors of the specification intended. Although the spec as written has problems (bug 5328 has been raised), the intent is that the enumeration facet as written should be interpreted as an instance of the type being restricted. Previously enumeration facets on union and list types were doing a string comparison on the lexical value.

The reporting of keyRef validation errors has been improved. Multiple errors can now be reported in a single schema validation run, and the line number given with the error message reflects the location of the unresolved keyRef value, rather than the end of the document as before.

A new configuration option is available to control whether the schema processor takes notice (and attempts to dereference) xsi:schemaLocation and xsi:noNamespaceSchemaLocation attributes encountered in an instance document that is being validated. This is available as the named property USE_XSI_SCHEMA_LOCATION on the TransformerFactory and Configuration classes, via methods on the S9API and .NET SchemaValidator classes, and the XQJ class SaxonXQDataSource, and via the -xsiloc option on the command line interfaces Validate, Transform, and Query.

New methods have been added to class com.saxonica.schema.SchemaCompiler to allow setting of "deferred validation mode". In this mode a sequence of calls on readSchema() can be made, followed by a single call on compile(). The effect is to defer all generation of the finite state machines used for run-time validation until compile() is called. This avoids repeated (and wasted) recompilation of complex types every time new elements are added to a substitution group, or every time a new complex type is derived by extension from an existing type. This facility was developed with XBRL as the primary use case, and has the effect of reducing compilation time for this collection of schema documents from 400 seconds to 560 milliseconds.

When minOccurs and numeric maxOccurs constraints (other than 0, 1, or unbounded) appear on an element or wildcard particle, Saxon now implements a finite state machine using simple counters to count the number of occurrences, rather than "unfolding" the FSM as previously. This removes the limits on the values of minOccurs and maxOccurs, as well as the cost in time and memory of handling large finite values of minOccurs and maxOccurs. The unfolding technique is still used when minOccurs and maxOccurs appear on other kinds of particle, specifically on sequence or choice groups, or when "vulnerable" repeated element and wildcard particles appear within a model group that can itself be repeated (a particle is vulnerable if all the other particles in the model group are optional). A side-effect of this change is that the diagnostics are more specific when a validation failure occurs.

Another side-effect, hopefully temporary, is that some rather artificial type derivations are no longer allowed: specifically those where a wildcard with maxOccurs in the base type is specialized to a sequence of specific element particles in the derived type