There are several issues that need to be addressed if one aims
to truly support multi-lingual text. Some of these issues have
been resolved in Passepartout, while others are still waiting for
solutions. First of all, you need a way to represent all
possible letters and symbols of all languages in a single
file. The solution to this is Unicode. XML files use Unicode in
the form of UTF-8 as the default encoding. If you are using a
different encoding, you can still access any Unicode symbol
using a so-called character entity
,
e.g. α
will yield the Greek letter
α.
ISO-8859-1 is perfectly adequate for writing something likeZwölf Boxkämpfer jagten Victor quer über den großen Sylter Deich, but it wouldn't be sufficient for Esperanto:Laŭ Ludoviko Zamenhof bongustas freŝa ĉeĥa manĝaĵo kun spicoj, so you would need to use a lot of character entities (Dr. Zamenhof lived before the computer revolution, else he would surely have used ASCII for Esperanto). The use of UTF-8 is strongly recommended.
Secondly, in order to display Unicode text, you will need a typeface that contains glyphs for all the characters in the text. Most computer fonts only cover one alphabet. One of the few fonts which can display this entire text is called Gentium. Ideally, if a given font didn't contain a certain glyph, Passepartout should look for that glyph in the other fonts available. This is yet to be implemented, however.
In the Olden Days (unfortunately the Olden Days haven't quite ended yet), PostScript Type1 fonts for displaying Cyrillic text, such asВ чащах юга жил бы цитрус? Да, но фальшивый экземпляр!, would often pretend to be Latin fonts. For example, D might render as Д. Such fonts are of no use to Passepartout.
The third issue is the fact that not all scripts are written from the left to the right. Examples of languages which are written from the right to the left include Arabic and Hebrew. While Unicode has support for controlling direction of writing, Passepartout can only render text written from the left to the right as of this writing. Asian ideographic languages often allow more than one direction of writing, but the field of Asian typography is something the current developer team has little knowledge of anyway.
Some ancient scripts from the Mediterranean area were written in boustrophedon (from the Greekβους στρεφειν), i.e. alternating between left-to-right and right-to-left between each line of text.
The fourth issue concerns the frequent use of ligatures in some scripts. A ligature is a single glyph which is used to represent more than one letter. In western typography, only a few ligatures, such as fi and ff are commonly used (compare fig and fig), but ligatures are much more common when typesetting Arabic scripts. Passepartout has no automatic support for ligatures yet, although they may be inserted manually if they exist as separate symbols in Unicode.
The fifth and final issue has to do with typographic practices
being slightly different in different countries. For instance,
the number 65536 would be written in English typography as
65,536, but in Sweden the comma would be replaced by a period,
or preferably a small space. The German language has a rule that
says that the letter combination ck
is hyphenated as
k-k
when surrounded by vowels. Quotation marks also vary
a lot. French typography uses « guillemets ». Some of these
rules can be addressed with different XSLT stylesheets; other
will require localized rules.
In short, the problem of international typography is a difficult one, but we hope and aim for Passepartout to be useful for typesetting languages other than English and Swedish.