Controlling parsing of source documents
Saxon does not include its own XML parser. By default:
- 
        SaxonJ uses the default SAX parser provided as part of the JDK. With the Oracle JDK, this is a variant of the Apache Xerces parser customized by Oracle. 
- 
        SaxonCS uses the System.Xml.XmlReaderparser family.
An error reported by the XML parser is generally fatal. It is not possible to process ill-formed XML.
With SaxonJ, there are several ways you can cause a different XML parser to be used:
- 
        The -xand-yoptions on the command line can be used to specify the class name of a SAX parser, which Saxon will load in preference to the default SAX parser. The-xoption is used for source XML documents, the-yoption for schemas and stylesheets. The equivalent options can be set programmatically or by using the configuration file.
- 
        By default Saxon uses the SAXParserFactorymechanism to load a parser. This can be configured by setting the system propertyjavax.xml.parsers.SAXParserFactory, by means of the filelib/jaxp.propertiesin the JRE directory, or by adding another parser to thelib/endorseddirectory.
- 
        The source for parsing can be supplied in the form of a SAXSourceobject, which has anXMLReaderproperty containing the parser instance to be used.
- 
        For a document read using the doc()ordocument()functions, the parser (XMLReader) to be used can be specified using the query parameter?parser=full.class.namein the document URI -- but only if theStandardURIResolveris used, and the feature is enabled by callingConfiguration.setParameterizedURIResolver()or by setting-p:onon theQueryorTransformcommand lines. For example,parser=org.ccil.cowan.tagsoup.Parsercauses John Cowan's TagSoup parser for HTML to be used.
Saxonica traditionally recommended use of the Xerces parser from Apache in preference to the version bundled in the JDK, which was known to have some serious bugs. However, the version bundled in Java 8 appears to be more reliable.
By default, Saxon invokes the parser in non-validating mode (that is, without requested DTD
      validation). Note however, that the parser still needs to read the DTD if one is present,
      because it may contain entity definitions that need to be expanded. DTD validation can be
      requested using -dtd:on on the command line, or equivalent API or configuration
      options.
Saxon never asks the XML parser to perform schema validation. If schema validation is
      required it should be requested using the command line options -val:strict or
        -val:lax, or their API equivalents. Saxon will then use its own schema
      processor to validate the document as it emerges from the XML parser. Schema processing is
      done in parallel with parsing, by use of a SAX-like pipeline.