Writing input filters
SaxonJ generally takes its input from a JAXP
SAXSource object, which
represents a sequence of SAX events as output by an XML parser. These events are sent
to the internal class ReceivingContentHandler,
which converts them to a slightly
different format, which are then passed to a Saxon
Receiver. In a typical scenario, the events are passed through
a pipeline of
Receivers, each of which modifies the events in some way.
Examples of the steps on this pipeline include:
- A whitespace stripper, responsible for removing whitespace as directed by the
- A schema validator, responsible for performing schema validation (which not only validates the input against the schema, but also adds type annotations and expands default values for absent attributes).
- An annotation stripper, responsible for removing type annotations as directed
input-type-annotations="strip"attribute in a stylesheet.
At the end of this pipeline, the events are typically passed to one of:
- A tree builder, which builds a tree of nodes, ready for query or transformation.
- A streaming XSLT transformation.
- A serializer (to implement an identity transformation).
It is possible to add a user-written filter to the input pipeline. This might be used, for example, to:
- Rename elements or attributes, perhaps changing their namespace.
- Add or remove elements or attributes.
- Strip comments or processing instructions.
- Expand processing instructions (for example, a processing instruction might contain a SQL query to access a database).
- Perform a complete XSLT transformation, streamed or unstreamed.
A filter can either be inserted to process SAX events, before they are converted
Receiver events, or it can be inserted to process
events after the conversion.
To filter events at the SAX level, the techniques include:
Generate the transformation as an
newXMLFilter()method of the
TransformerFactory. This works with XSLT only. A drawback of this approach is that it is not possible to supply parameters to the transformation using standard JAXP facilities. It is possible, however, by casting the
XMLFilterto a net.sf.saxon.jaxp.FilterImpl, and calling its
getTransformer()method, which returns a
Transformerobject offering the usual
Generate the transformation as a SAX
newTransformerHandler()method. The pipeline stages after the transformation can be added by giving the transformation a
SAXResultas its destination. This again is XSLT only.
Implement the pipeline step before the transformation or query as an
XMLFilter, and use this as the
XMLReaderpart of a
SAXSource, pretending to be an XML parser. This technique works with both XSLT and XQuery, and it can even be used from the command line, by nominating the
XMLFilteras the source parser using the
-xoption on the command line.
To insert a filter for
Receiver events, it is usual to implement the
filter by extending the class ProxyReceiver, overriding only the methods for those
events that need to be changed. The filter can be injected into the pipeline by supplying
the document in the form of an AugmentedSource: a typical example would be:
MyFilter is typically a class that extends
by overriding some of its methods: for example, you might override the
method to do nothing, which has the effect of stripping comments from the source document.
Filters inserted into the pipeline in this way are applied after any system-defined filters such as the schema validator.