Schema validation using JAXP
Applications can invoke schema processing using the APIs provided in JAXP 1.3. This
makes Saxon interchangeable with other schema processors implementing this interface.
There is full information on these APIs in the Java documentation. The two main
mechanisms are the Validator class, and the ValidatorHandler
class. Sample applications using these interfaces are provided in the
samples/java directory of the saxon-resources download (see
SchemaValidatorExample.java and SchemaValidatorHandlerExample.java).
Saxon also supplies the class com.saxonica.ee.jaxp.ValidatingReader, which implements the SAX2
XMLReader interface, allowing it to be used as a schema-validating XML
parser.
The main steps are:
-
Create a
SchemaFactory, by callingSchemaFactory.newInstance()with the argument"http://www.w3.org/2001/XMLSchema", and with the Java system properties set up to ensure that Saxon is loaded as the chosen schema processor. Saxon will normally be loaded as the default schema processor if Saxon-EE is present on the classpath, but to make absolutely sure, set the system propertyjavax.xml.validation.SchemaFactory:http://www.w3.org/2001/XMLSchemato the value com.saxonica.ee.jaxp.SchemaFactoryImpl. Note that if you set this property using a property file, colons in the property name must be escaped as "\:". -
Process a schema document by calling one of the several
newSchema()methods on the returnedSchemaFactory. -
Create either a
Validatoror aValidatorHandlerfrom this returnedSchema. -
Use the
ValidatororValidatorHandlerto process one or more source documents.
Note that additional schemas referenced from the xsi:schemaLocation
attributes within the source documents will be loaded as necessary. A target namespace
is ignored if there is already a loaded schema for that namespace; Saxon makes no
attempt to load multiple schemas for the same namespace and check them for
consistency.
Although the API is defined in such a way that a Validator or
ValidatorHandler is created for a particular Schema, in the
Saxon implementation the schema components that are available to the validator are not
only the components within that schema, but all the components that form part of any
schema registered with the Configuration.
Another way to control validation from a Java application is to run a JAXP identity
transformation, having first set the option to perform schema validation. The following
code (from the sample application QuickValidator.java) illustrates
this:
If you set an ErrorListener on the TransformerFactory, then
you can control the way that error messages are output.
If you want to validate against a schema without hard-coding the URI of the schema into
the source document, you can do this by pre-loading the schema into the
TransformerFactory. This extended example (again from the sample
application QuickValidator.java) illustrates this:
You can preload as many schemas as you like using the addSchema() method.
Such schemas are parsed, validated, and compiled once, and can be used as often as you
like for validating multiple source documents. You cannot unload a schema once it has
been loaded. If you want to remove or replace a schema, start afresh with a new
TransformerFactory.
Behind the scenes, the TransformerFactory uses a Configuration
object to hold all the configuration information. The basic Saxon product (Saxon-HE and
Saxon-PE) uses the class net.sf.saxon.TransformerFactoryImpl for the TransformerFactory, and
net.sf.saxon.Configuration
for the underlying configuration information. The schema-aware product (Saxon-EE)
subclasses these with com.saxonica.config.EnterpriseTransformerFactory and com.saxonica.config.EnterpriseConfiguration respectively. You can get hold of
the Configuration object by casting the TransformerFactory to
a Saxon TransformerFactorImpl and calling the
getConfiguration() method. This gives you more precise control, for
example it allows you to retrieve the Schema object containing the schema
components for a given target namespace, and to inspect the compiled schema to establish
its properties. See the JavaDoc documentation for further details.
The programming approach outlined above, of using an identity transformer, is suitable
for a wide class of applications. For example, it enables you to insert a validation
step into a SAX-based pipeline. However, for finer control, there are lower-level
interfaces available in Saxon that you can also use. See for example the JavaDoc for the
EnterpriseConfiguration class, which includes methods such as
getElementValidator(). This constructs a Receiver which acts as a validating XML event
filter. This can be inserted into a pipeline of Receivers. Saxon also
provides classes to bridge between SAX events and Receiver events: ReceivingContentHandler and ContentHandlerProxy
respectively.