Resolving entities

XML documents may contain references to external entities, including general entities, DTDs, and parameter entities. Typically these references include a system identifier (URI) and/or a public ID. It is the responsibility of the XML parser to resolve these references, but the process can be influenced using Saxon interfaces.

On Java, most applications use a SAX parser such as Xerces. Entity resolution with a SAX parser is controlled by supplying an EntityResolver. If no other EntityResolver is supplied, then when Saxon instantiates a SAX parser, it constructs an EntityResolver that invokes the configuration-level ResourceResolver, which has the capability to resolve entity references using catalogs (as well as supporting the classpath and data URI schemes).

If the SAX parser is instantiated by user code, however (for example when a SAXSource is supplied), then it is the responsibility of the user code to initialize the parser's EntityResolver as required: Saxon will not modify the settings.

Saxon will also accept input from a StAX parser. In this case, configuring the parser for entity resolution is entirely the responsibility of the calling application.

On .NET, Saxon always uses the Microsoft System.Xml parser. Entity resolution in this parser is controlled using the System.Xml.XmlResolver interface. When Saxon instantiates a System.Xml parser, it constructs an XmlResolver that invokes the configuration-level ResourceResolver, which has the capability to resolve entity references using catalogs (as well as supporting the data URI schemes).

Many XML users are concerned about security vulnerabilities in the area of external entity references. If source documents containing untrusted entity references are accepted, it is possible for these to access files in local filestore that might contain sensitive data. It is good practice to configure an XML parser to avoid these risks. If all entity references are resolved using a user-supplied ResourceResolver, then the resolver has total control over which URIs are accepted and which are rejected.

On Java, JAXP interfaces provide a number of configuration properties to control this directly: see the JAXP Security Guide. Saxon recognizes the properties FEATURE_SECURE_PROCESSING, ACCESS_EXTERNAL_DTD, ACCESS_EXTERNAL_SCHEMA, and ACCESS_EXTERNAL_STYLESHEET in its implementations of relevant JAXP interfaces.

Generalizing this mechanism, Saxon also provides a configuration property ALLOWED_PROTOCOLS which has the same format as the JAXP properties (a comma-separated list of permitted URI schemes), which is enforced by the default configuration-level ResourceResolver. Note that not all URIs are processed using this mechanism: for example, a URI that is resolved by a local ResourceResolver set on an XsltTransformer or XQueryEvaluator is able to bypass these checks.