Collation URIs

Collations used for comparing strings can be specified by means of a URI. A collation URI may be used as an argument to many of the standard functions, and also as an attribute of xsl:sort in XSLT, and in the order by clause of a FLWOR expression in XQuery.

The W3C specifications leave the details of collation URIs entirely implementation-defined. This section explains the collation URIs that can be used with Saxon.

In Saxon XSLT stylesheets, collations may be described using a saxon:collation element as a top-level declaration in the stylesheet. In this case the value of the name attribute of the saxon:collation may be used as a collation URI. There is no constraint on the form this URI takes, indeed there is no requirement that it be a legal URI. See saxon:collation for more details.

A collation URI may also be constructed directly. This enables collation URIs to be used in XPath and XQuery applications as well as in XSLT stylesheets. Such a collation URI takes the form http://saxon.sf.net/collation?keyword=value;keyword=value;.... The query parameters in the URI can be separated either by ampersands or semicolons, but semicolons are usually more convenient. The keywords available are as follows:

keyword

values

effect

class

fully-qualified Java class name of a class that implements java.util.Comparator.

This parameter should not be combined with any other parameter. An instance of the requested class is created, and is used to perform the comparisons. Note that if the collation is to be used in functions such as contains() and starts-with(), this class must also be a java.text.RuleBasedCollator. This approach allows a user-defined collation to be implemented in Java.

lang

any value allowed for xml:lang, for example en-US for US English

This is used to find the collation appropriate to a Java locale. The collation may be further tailored using the parameters strength and decomposition.

strength

primary, secondary, tertiary, or identical

Indicates the differences that are considered significant when comparing two strings. A/B is a primary difference; A/a is a secondary difference; a/� is a tertiary difference (though this varies by language). So if strength=primary then A=a is true; with strength=secondary then A=a is false but a=� is true; with strength=tertiary then a=� is false.

decomposition

none, standard, full

Indicates how the collator handles Unicode composed characters. See the JDK documentation for details.

It is also possible to specify the Unicode Codepoint Collation defined in the W3C specifications, currently http://www.w3.org/2003/11/xpath-functions/collation/codepoint.

In addition, the APIs provided for executing XPath and XQuery expressions allow named collations to be registered by the calling application, as part of the static context.

Expand

Next