Class AlphaCode


  • public class AlphaCode
    extends java.lang.Object

    An AlphaCode is a compact, context-independent string representation of a SequenceType

    The syntax actually handles ItemTypes as well as SequenceTypes; and in addition, it can handle the two examples of NodeTests that are not item types, namely *:local and uri:*. It can therefore be used in the SEF wherever a SequenceType, ItemType, or NodeTest is required.

    The first character of an alphacode is the occurrence indicator. This is one of: * (zero or more), + (one or more), ? (zero or one), 0 (exactly zero), 1 (exactly one). If the first character is not one of these, then "1" is assumed; but the occurrence indicator is generally omitted only when representing an item type as distinct from a sequence type.

    The occurrence indicator is immediately followed by the "primary alphacode" for the item type. These are chosen so that alphacode(T) is a prefix of alphacode(U) if and only if T is a supertype of U. For example, the primary alphacode for xs:integer is "ADI", and the primary alphacode for xs:decimal is "AD", reflecting the fact that xs:integer is a subtype of xs:decimal. The primary alphacodes are as follows:

    • "" (zero-length string): item()
    • A: xs:anyAtomicType
    • AB: xs:boolean
    • AS: xs:string
    • ASN: xs:normalizedString
    • ASNT: xs:token
    • ASNTL: xs:language
    • ASNTK: xs:NMTOKEN
    • ASNTN: xs:Name
    • ASNTNC: xs:NCName
    • ASNTNCI: xs:ID
    • ASNTNCE: xs:ENTITY
    • ASNTNCR: xs:IDREF
    • AQ: xs:QName
    • AU: xs:anyURI
    • AA: xs:date
    • AM: xs:dateTime
    • AMP: xs:dateTimeStamp
    • AT: xs:time
    • AR: xs:duration
    • ARD: xs:dayTimeDuration
    • ARY: xs:yearMonthDuration
    • AG: xs:gYear
    • AH: xs:gYearMonth
    • AI: xs:gMonth
    • AJ: xs:gMonthDay
    • AK: xs:gDay
    • AD: xs:decimal
    • ADI: xs:integer
    • ADIN: xs:nonPositiveInteger
    • ADINN: xs:negativeInteger
    • ADIP: xs:nonNegativeInteger
    • ADIPP: xs:positiveInteger
    • ADIPL: xs:unsignedLong
    • ADIPLI: xs:unsignedInt
    • ADIPLIS: xs:unsignedShort
    • ADIPLISB: xs:unsignedByte
    • ADIL: xs:long
    • ADILI: xs:int
    • ADILIS: xs:short
    • ADILISB: xs:byte
    • AO: xs:double
    • AF: xs:float
    • A2: xs:base64Binary
    • AX: xs:hexBinary
    • AZ: xs:untypedAtomic
    • N: node()
    • NE: element(*)
    • NA: attribute(*)
    • NT: text()
    • NC: comment()
    • NP: processing-instruction()
    • ND: document-node()
    • NN: namespace-node()
    • F: function(*)
    • FM: map(*) -- including record types
    • FA: array(*)
    • E: xs:error
    • X: external (wrapped) object
    • XJ: external Java object
    • XN: external .NET object
    • XS: external Javascript object

    Every item belongs to one or more of these types, and there is always a "most specific" type, which is the one that we choose.

    Following the occurrence indicator and primary alphacode are zero or more supplementary codes. Each is preceded by a single space, is identified by a single letter, and is followed by a parameter value. For example the sequence type "element(BOOK)" is coded as "1NE nQ{}BOOK" - here 1 is the occurrence indicator, NE indicates an element node, and nQ{}BOOK is the required element name. The identifying letter here is "n". The supplementary codes (which may appear in any order) are as follows:

    n - Name, as a URI-qualified name. Used for node names when the primary alphacode is one of (NE, NA, NP). Also used for the XSD type name when the type is a user-defined atomic or union type: the basic alphacode then represents the lowest common supertype that is a built-in type. (Note: we assume that type names are globally unique. This cannot be guaranteed when deploying a SEF file: the schema at the receiving end might vary from that of the sender.) Also used for the class name in the case of external object types (in this case the namespace part will always be "Q{}"). Note that strictly speaking, the forms *:name and name:* can appear in a NameTest, but never in a SequenceType. However, they can be represented in alphacodes using the syntax "n*:name" and "nQ{uri}*" respectively. The syntax "~localname" is used for a name in the XSD namespace.

    • c - Node content type (XSD type annotation), as a URI-qualified name optionally followed by "?" to indicate nillable. The syntax "~localname" is used for a name in the XSD namespace. Optionally present when the basic code is (NE, NA); omitted for NE when the content is xs:untyped, and for NA when the content is xs:untypedAtomic. Only relevant for schema-aware code.

    • k - Key type, present when the basic code is FM (i.e. for maps), omitted if the key type is xs:anyAtomicType. The value is the alphacode of the key type, enclosed in square brackets: it will always start with "1A".

    • v - Value type, present when when the basic code is (FM, FA) (i.e. for maps and arrays), omitted if the value type is item()*. The value is the alphacode of the value type, enclosed in square brackets. For example the alphacode for array(xs:string+)* is "*FA v[+AS]".

    • r - Return type, always present for functions. The value is the alphacode of the return type, enclosed in square brackets.

    • a - Argument types, always present for functions. The value is an array of alphacodes, enclosed in square brackets and separated by commas. For example, the alphacode for the function fn:dateTime#2 (with signature ($arg1 as xs:date?, $arg2 as xs:time?) as xs:dateTime?) is "1F r[?AM] a[?AA,?AT]"

      Also used for record types: indicates the types declared for the fields of the record type. As a special case, a self-reference field within a record type is represented by "%.." where % is the occurrence indicator, for example "1.." for a self-reference field with cardinality one.

    • m - Member types of an anonymous union type. The value is an array of alphacodes for the member types (these will always be atomic types), enclosed in square brackets and comma-separated. The basic code in this case will be "A", indicating xs:anyAtomicType. This is not used for the built-in union type xs:numeric, nor for user-defined atomic types defined in a schema; it is used only for anonymous union types defined using the Saxon extension syntax "union(a, b, c)".

    • e - Element type of a document-node() type, present optionally when the basic code is ND. The value is an alphacode, which will always start with "1NE".

    • f, F - Fields of a record type (previously called tuple type). The value is a comma-separated list of tokens, enclosed in square brackets, where each token comprises the name of the component, optionally followed by a question mark if the field is optional. Any ASCII characters in the field name that are not valid NCName characters are escaped by preceding them with a backslash.

    • i, u, d - Venn type. The item type is the intersection, union, or difference of two item types. The letter "i", "u", or "d" indicates intersection, union, or difference respectively, followed by a list of (currently always two) item types enclosed in square brackets and separated by a comma. The principal type will typically be "N" or "NE". Saxon uses venn types internally to give a more precise inferred type for expressions; it is probably largely unused at run-time, and can therefore be safely ignored when reading a SEF file.

    Named union types have a basic alphacode of "A", followed by the name of the union type in the form "A nQ{uri}local". The syntax "~localname" is used for a name in the XSD namespace, so the built-in union types xs:numeric and xs:error are represented as "A n~numeric" and "A n~error" respectively.

    TODO: the documentation for union types is not aligned with the current implementation

    • Constructor Detail

      • AlphaCode

        public AlphaCode()
    • Method Detail

      • toXdmMap

        public static MapItem toXdmMap​(java.lang.String input)
        Parse an AlphaCode into an XDM map
        Parameters:
        input - the input alphacode
        Returns:
        the resulting map
        Throws:
        java.lang.IllegalArgumentException - if the input is not a valid AlphaCode
      • fromXdmMap

        public static java.lang.String fromXdmMap​(MapItem map)
        Serialize the XDM map representation of an alphacode
        Parameters:
        map - the alphacode represented as an XDM map
        Returns:
        the corresponding alphacode as a string
      • toSequenceType

        public static SequenceType toSequenceType​(java.lang.String input,
                                                  Configuration config)
        Convert an AlphaCode to a SequenceType
        Parameters:
        input - the input alphacode
        config - the Saxon Configuration (which must contain any user-defined types that are referenced in the Alphacode)
        Returns:
        the corresponding SequenceType
        Throws:
        java.lang.IllegalArgumentException - if the input is not a valid AlphaCode
      • toItemType

        public static ItemType toItemType​(java.lang.String input,
                                          Configuration config)
        Convert an AlphaCode to an ItemType. The occurrence indicator of the alphacode may be omitted, or may be "1": any other value is treated as an error.
        Parameters:
        input - the input alphacode
        config - the Saxon Configuration (which must contain any user-defined types that are referenced in the Alphacode)
        Returns:
        the corresponding SequenceType
        Throws:
        java.lang.IllegalArgumentException - if the input is not a valid AlphaCode
      • fromItemType

        public static java.lang.String fromItemType​(ItemType type)
        Convert an item type to an alphacode
        Parameters:
        type - the item type to be converted
        Returns:
        the corresponding alphacode. Note that this will have no occurrence indicator.
      • fromSequenceType

        public static java.lang.String fromSequenceType​(SequenceType type)
        Convert a sequence type to an alphacode
        Parameters:
        type - the sequence type to be converted
        Returns:
        the corresponding alphacode (including occurrence indicator as the first character)