Schema Processing using JAXP
Applications can invoke schema processing using the APIs provided in JAXP 1.3. This makes Saxon
interchangeable with other schema processors implementing this interface. There is full information on
these APIs in the Java documentation. The two main mechanisms are the Validator
class, and the ValidatorHandler class. Sample applications using these interfaces are
provided in the samples/java directory of the saxon-resources download. Saxon also supplies the class
com.saxonica.jaxp.ValidatingReader, which implements the SAX2 XMLReader
interface, allowing it to be used as a schema-validating XML parser.
The main steps are:
-
Create a
SchemaFactory, by callingSchemaFactory.getInstance()with the argument"http://www.w3.org/2001/XMLSchema", and with the Java system properties set up to ensure that Saxon is loaded as the chosen schema processor. Saxon will normally be loaded as the default schema processor if Saxon-EE is present on the classpath, but to make absolutely sure, set the system propertyjavax.xml.validation.SchemaFactory:http://www.w3.org/2001/XMLSchemato the valuecom.saxonica.jaxp.SchemaFactoryImpl. Note that if you set this property using a property file, colons in the property name must be escaped as "\:". -
Process a schema document by calling one of the several
newSchemamethods on the returnedSchemaFactory. -
Create either a
Validatoror aValidatorHandlerfrom this returnedSchema. -
Use the
ValidatororValidatorHandlerto process one or more source documents.
Saxon also provides the class SchemaFactory11 which automatically enables support
for XSD 1.1. When the JAXP search mechanism is used, this schema factory will be selected if the
schema language required is set to http://www.w3.org/XML/XMLSchema/v1.1. Saxon also
recognizes the generic language identifier http://www.w3.org/XML/XMLSchema and the
XSD 1.0 identifier http://www.w3.org/XML/XMLSchema/vX.Y as requests for an
XSD 1.0 processor.
Note that additional schemas referenced from the xsi:schemaLocation attributes
within the source documents will be loaded as necessary. A target namespace is ignored if there is already
a loaded schema for that namespace; Saxon makes no attempt to load multiple schemas for the same
namespace and check them for consistency.
Although the API is defined in such a way that a Validator or ValidatorHandler
is created for a particular Schema, in the Saxon implementation the schema components that
are available to the validator are not only the components within that schema, but all the components that form part
of any schema registered with the Configuration.
Another way to control validation from a Java application is to run a JAXP
identity transformation, having first set the option to perform schema validation.
The following code (from the sample application QuickValidator.java) illustrates this:
If you set an ErrorListener on the TransformerFactory, then you can control
the way that error messages are output.
If you want to validate against a schema without hard-coding the URI of the schema into the source
document, you can do this by pre-loading the schema into the TransformerFactory. This extended
example (again from the sample application QuickValidator.java) illustrates this:
You can preload as many schemas as you like using the addSchema method. Such schemas are parsed,
validated, and compiled once, and can be used as often as you like for validating multiple source documents. You
cannot unload a schema once it has been loaded. If you want to remove or replace a schema, start afresh with a
new TransformerFactory.
Behind the scenes, the TransformerFactory uses a Configuration object to hold all
the configuration information. The basic Saxon product uses the class net.sf.saxon.TransformerFactoryImpl
for the TransformerFactory, and net.sf.saxon.Configuration for the underlying
configuration information. The schema-aware product subclasses these with
com.saxonica.config.SchemaAwareTransformerFactory
and com.saxonica.config.EnterpriseConfiguration respectively.
You can get hold of the Configuration object by casting the TransformerFactory
to a Saxon TransformerFactorImpl and calling the getConfiguration() method. This
gives you more precise control, for example it allows you to retrieve the Schema object containing
the schema components for a given target namespace, and to inspect the compiled schema to establish its properties.
See the JavaDoc documentation for further details.
Saxon currently implements its own API for access to the schema components. This API should be regarded as temporary. In the longer term, it is possible that Saxon will offer an API for schema access that has been proposed in a member submission to W3C.
The programming approach outlined above, of using an identity transformer,
is suitable for a wide class of applications. For example,
it enables you to insert a validation step into a SAX-based pipeline. However, for finer control, there are
lower-level interfaces available in Saxon that you can also use. See for example the JavaDoc for the
EnterpriseConfiguration class, which includes methods such as getElementValidator. This
constructs a Receiver which acts as a validating XML event filter. This can be inserted into a pipeline
of Receivers. Saxon also provides classes to bridge between SAX events and Receiver
events: ReceivingContentHandler and ContentHandlerProxy respectively.