XML Schema 1.0 implementation
The command line interface
com.saxonica.Validate has been completely redesigned, allowing
multiple schema documents to be loaded and multiple instance documents to be validated.
This release of Saxon introduces preliminary support for assertions in a schema, based on the
current (31 August 2006) draft of XML Schema version 1.1. This allows a complex type to contain an assertion
about the content of the corresponding element expressed as an arbitrary XPath 2.0 expression. Please note that this
facility in the Working Draft is likely to change, and the Saxon implementation will change accordingly. For further
The XML Schema specification imposes a rule that when one type R is derived from another type B by restriction, then every element particle ER in the content model of R must be compatible with the corresponding element particle EB in B. One aspect of this is that the identity constraints defined in the declaration of ER (that is, unique, key, and keyref) must be a superset of the constraints defined for EB. The specification doesn't say how to decide whether two constraints are equivalent for this purpose, and Saxon has previously ignored this requirement. At this release a check is introduced which partially implements the rule. Specifically, Saxon will count the number of constraints that are defined, and will report an error if EB has more constraints of any particular kind (unique, key, or keyref) than ER has. If EB has at least one constraint and ER has one or more, then Saxon will output a warning saying that it was unable to check whether the constraints were compatible with each other.
It is now possible when requesting validation of an instance to specify the required name of the top-level element
in the document being validated. This is possible through the option
-top:clarkname on the
com.saxonica.Validate command, or via a new property on the
The property is also available on the
DocumentBuilder in the .NET API and in the new s9api Java API.
A validation error occurs if the document being validated has a top-level element with a different name.
I discovered that Saxon allows you to use the types
in a schema as built-in types. XML Schema 1.0 doesn't recognize these types (though I can't find a rule that says it is
absolutely non-conformant to accept them). I have changed the code to give an interoperability warning if they are
used. I have also disallowed the use of the type
xs:anyAtomicType, which has no defined validation
The mechanisms for comparing values in the course of schema validation and processing have now been separated completely from the mechanisms used when implementing XPath operators. This means that the semantics of comparison and ordering should now follow the XML Schema specification precisely. Previously some operations were implemented according to the XPath semantics.
xsi:noNamespaceSchemaLocation attribute is now
ignored (previously it was rejected under the rule that such an attribute cannot appear after the first element
in the relevant namespace). Duplicates can arise naturally from XInclude processing, so they are now accepted
and ignored. The schema specification permits this but does not require it. To be considered duplicates, the
declarations must match in the namespace URI and in the absolutized schemaLocation URI.
Result tree validation
Saxon now does more extensive compile-time checking where an
instruction requests validation of the result tree. This means that validation errors that were previously detected
at stylesheet execution time are now sometimes detected at compile time. Previously these checks were only done when
validation was requested on an element-constructor instruction.
Expansion of attribute and element defaults
When the input or output of a query or transformation is validated, it is now possible to request that fixed and default
element and attribute values defined in the schema should not be expanded. This is done using the option
on the command line, or equivalent options in the
The same option also applies to DTD-based attribute default expansion, provided that the XML parser reports sufficient information to the application.
Serializing a Schema Component Model
It is now possible to export the contents of the schema cache held in the
object to an XML file (with the conventional extension
.scm for Schema Component Model). The contents
can subsequently be reloaded. This is faster than reloading the original source schema documents,
because it allows most of the validation to be skipped, along with the sometimes expensive operation of constructing
and determinizing finite state machines. This facility is intended to be used in conjunction with XQuery
Java code generation: it allows the schemas that were imported by a compiled query to be saved on disk alongside
the compiled query itself, for rapid reloading at run time.
The serialized SCM file is also designed to be easy for applications to process. The representation of schema components is more uniform than in source .xsd documents (there are fewer defaults, and fewer alternative ways of expressing the same information). This makes it a suitable representation for applications that need to process or analyze schema information, as an alternative to using the Java API.
reportelements threatened to make this even more complex. So a simple XSLT transformation was written to take the finite state machines in the SCM version of the schema-for-schemas and generate Java code from them. This means that Saxon's schema validation logic is now derived directly from the published schema-for-schemas, while retaining the efficiency of hard-coded Java.
Changes to the Schema Component Model API
Changes have been made to the API for the schema component model (package
to align it more closely with the abstract model defined in the W3C specifications.
All named components now consistently expose
to provide access to the local part of the name and the namespace URI respectively.
The wide variety of existing names for these accessors have been retained for the
time being as deprecated methods. The new names are chosen because they correspond
to the names used for these properties in the W3C schema component model.
FacetCollection has disappeared; its functionality has been merged into
Compositor has been renamed
ModelGroup, and its subclasses
ChoiceCompositor have been renamed accordingly. In the W3C schema model, the
compositor (all, choice, sequence) is one of the properties of the
ModelGroup. This is now
available using the method
getCompositorName() on the
Particle is now an abstract class rather than an interface, and the previous
AbstractParticle no longer exists. There are three subclasses of
ModelGroupParticle. This means there is now a destinction between the
which represents a reference to a
ModelGroup, and the
ModelGroupDefinition (which represents a named model group) no longer
Particle; it is now a subclass of
GroupReference; it is no longer
necessarily a reference to a (named)
ModelGroupDefinition, but now can be a reference
to any (named or unnamed)
AttributeWildcard are no longer subclasses
Wildcard is now a helper class to which these
two classes delegate. Instead,
ElementWildcard is now a subclass of
getTerm() method of
ElementWildcard returns the
(previously it returned the
ElementWildcard object itself).
The use of exceptions
ValidationException has been made
more consistent. A
SchemaException indicates that the schema is invalid, and should occur only
while the schema is being loaded and validated. A
ValidationException indicates that an instance
document is invalid against the schema, and should occur only during instance validation. Errors relating to the
consistency of a stylesheet or query against a valid schema should result in an
XPathException being thrown.
An inconsistency in the schema found during instance validation is an internal error, and should result in an
IllegalStateException, except for unresolved references to missing schema components (which is defined
in the schema spec not to constitute a schema invalidity), which results in an
Because it can occur almost anywhere,
UnresolvedReferenceException is an unchecked exception.