Controlling Parsing of Source Documents

Saxon does not include its own XML parser. By default:

An error reported by the XML parser is generally fatal. It is not possible to process ill-formed XML.

There are several ways you can cause a different XML parser to be used:

Saxonica traditionally recommended use of the Xerces parser from Apache in preference to the version bundled in the JDK, which was known to have some serious bugs. However, there is some evidence that the version bundled in Java 8 is more reliable.

By default, Saxon invokes the parser in non-validating mode (that is, without requested DTD validation). Note however, that the parser still needs to read the DTD if one is present, because it may contain entity definitions that need to be expanded. DTD validation can be requested using -dtd:on on the command line, or equivalent API or configuration options.

Saxon is issued with local copies of commonly-used W3C DTDs such as the XHTML, SVG, and MathML DTDs. When Saxon itself instantiates the XML parser, it will use an EntityResolver that causes these local copies of DTDs to be used rather than fetching public copies from the web (the W3C servers are increasingly failing to serve these requests as the volume of traffic is too high). It is possible to override this using the configuration setting ENTITY_RESOLVER_CLASS, which can be set to the name of a user-supplied EntityResolver, or to the empty string to indicate that no EntityResolver should be used. Saxon will not add this EntityResolver in cases where the XML parser instance is supplied by the caller as part of a SAXSource object. It will add it to a parser obtained as an instance of the class specified using the -x and -y command line options, unless either the use of the EntityResolver is suppressed using the ENTITY_RESOLVER_CLASS configuration option, or the instantiated parser already has an EntityResolver registered.

Saxon never asks the XML parser to perform schema validation. If schema validation is required it should be requested using the command line options -val:strict or -val:lax, or their API equivalents. Saxon will then use its own schema processor to validate the document as it emerges from the XML parser. Schema processing is done in parallel with parsing, by use of a SAX-like pipeline.