SAXONICA |
Collations used for comparing strings can be specified by means of a URI. A collation URI may
be used as an argument to many of the standard functions, and
also as an attribute of xsl:sort
in XSLT, and in the order by
clause of a FLWOR expression in XQuery.
Saxon provides a range of mechanisms for binding collation URIs. The language specifications simply say that collations used in sorting and in string-comparison functions are identified by a URI, and leaves it up to the implementation how these URIs are defined.
There is one predefined collation that cannot be changed. This is the Unicode Codepoint Collation defined in the
W3C specifications, currently http://www.w3.org/2005/04/xpath-functions/collation/codepoint
.
This collates string based on the integer values assigned by Unicode to each character, for example "ah!" sorts before
"ah?" because the Unicode codepoints for "ah!" are (97, 104, 33) while the codepoints for "ah?" are (97, 104, 63).
In addition, by default, Saxon allows a collation URI to take the form
http://saxon.sf.net/collation?keyword=value;keyword=value;...
. The query parameters
in the URI can be separated either by ampersands or semicolons, but semicolons are usually more
convenient. The keywords available are as follows:
keyword |
values |
effect |
class |
fully-qualified Java class name of a class that
implements |
This parameter should not be combined with any other parameter.
An instance of the requested class is created, and is used to perform
the comparisons. Note that if the collation is to be used
in functions such as contains() and starts-with(), this class must also be a
|
lang |
any value allowed for xml:lang, for example |
This is used to find the collation appropriate to a Java locale. The collation
may be further tailored using the parameters |
strength |
primary, secondary, tertiary, or identical |
Indicates the differences that are considered significant when comparing two strings. A/B is a primary difference; A/a is a secondary difference; a/� is a tertiary difference (though this varies by language). So if strength=primary then A=a is true; with strength=secondary then A=a is false but a=� is true; with strength=tertiary then a=� is false. |
decomposition |
none, standard, full |
Indicates how the collator handles Unicode composed characters. See the JDK documentation for details. |
This format of URI, http://saxon.sf.net/collation?keyword=value;keyword=value;...
,
is handled by Saxons default CollationURIResolver
. It is possible to replace or supplement
this mechanism by registering a user-written CollationURIResolver
. This must be an implementation
of the interface net.sf.saxon.sort.CollationURIResolver
, which only requires a single method,
resolve()
, to be implemented. The result of the method is in general a Comparator
,
though if the collation is to be used in functions such as contains()
which match parts of a string
rather than the whole string, then the result must also be an instance of java.text.Collator
.
A user-written CollationURIResolver
is registered with the Configuration
object,
either directly or in the case of XSLT by using the JAXP setAttribute()
method on the
TransformerFactory
(the relevant property name is FeatureKeys.COLLATION_URI_RESOLVER
).
This applies to all stylesheets and queries compiled and executed under that configuration.
In addition, the APIs provided for executing XPath and XQuery expressions allow named collations to be registered by the calling application, as part of the static context.
For XQuery, the class StaticQueryContext
also allows collations to be registered directly
by individual names. The system attempts to resolve a URI using these directly-registered names before it invokes
the CollationURIResolver
.
In Saxon XSLT stylesheets, collations may also be described using a saxon:collation
element as a top-level declaration in the stylesheet. In this case the value of the name
attribute of the saxon:collation
may be used as a collation URI. There is no constraint
on the form this URI takes, indeed there is no requirement that it be a legal URI.
See saxon:collation for more details.