Saxonica.com

Handling minOccurs and maxOccurs

Prior to release 9.1, Saxon used the validation algorithms described in Thompson and Tobin 2003. This algorithm can be very inefficient when large bounded values of minOccurs and maxOccurs are used in a content model; indeed, it can be so inefficient that the finite state machine is too large to fit in memory, and an OutOfMemory exception occurs.

From Saxon 9.1, many common cases of minOccurs and maxOccurs are handled using a finite state machine that makes use of counters at run-time. This eliminates the need to have one state in the machine for each possible number of occurrences of the repeating item. Instead, counters are maintained at run-time and compared against the minOccurs and maxOccurs values.

This technique is used under the following circumstances:

In cases where counters cannot be used, Saxon will still attempt to compile a finite state machine, but will use configuration-defined limits on minOccurs and maxOccurs to approximate the values requested. If the values used in the schema exceed these limits, Saxon will therefore approximate by generate a schema that does not strictly enforce the specified minOccurs and maxOccurs. The default limits are 100 and 250 respectively. Different limits can be set on the command line or via the Java API on the Configuration object. Note however that when several nested repeating groups are defined it is still possible for out-of-memory conditions to occur, even with quite modest values of minOccurs and maxOccurs.