The stc program reads a string table source from a plain text
file. This text file contains strings for several languages.
A directory structure for language/region/encoding is created below
the output directory. Binary string tables - one per
language/region/encoding triple - are placed in the
subdirectories.
The dkstt module can read these string tables. The file finding
mechanism in the dkapp module chooses the string table matching the
user's preferred language.
The input file is read line by line. Comments are started by the
raute (#) character and finished by the end of line.
The input consists of a list of string mappings.
Each string mapping consists of a key string and a list of
scope/value pairs. The scope is started by a language identifier;
region and encoding may follow. Scope and value are separated by
"=". The value is the string for the scope, enclosed in quotes.
Additionally the file may contain optional special information introduced by the keywords "$version", "$author" and "$comment".
$version 1.0 $author Dirk Krause $comment A demo string table to show the structure "/msg/h" en = "Hello!" de = "Hallo!" fr = "Bonjour!" "/msg/g" en = "Goodbye!" de = "Mach's gut!" fr = "Au revoir!"
The conversion is done by
stc <input-file> <target-directory>
The target directory is typically a directory below ${prefix}/share. I.e. to install string tables for an application named foo the command line may look like
stc foo.str /usr/local/share/foo
Today's computer systems allow users to choose their favorite
language, region and encoding settings. Some languages require
UTF-8 encoding, other languages leave it up to the user whether or
not to use UTF-8 encoding. Programs must be able to interact with
the user in the selected language and using the selected encoding.
So it is necessary to provide texts for some languages both UTF-8
encoded and 8-bit ASCII encoded.
The stc program from dklibs 1.8.0 (and above) has an automatic
encoding completion feature for some languages (languages which
mainly use characters in the UNICODE range
0x00000000...0x000000FF). If a text is specified only UTF-8 encoded
the ASCII encoded text is created automatically if it is missing
and vice versa.
At this time automatic encoding completion is supported for de, en, fr, nl, be, sp, pt, pl, cs, hu, sv, no and da. This list is possibly and likely incomplete, see the sections below how to add languages.
The auto_complete_languages[] string in the stc.c module contains the list of languages for which automatic encoding completion is enabled. You may modify this list before building and installing the dklibs package. If there are languages I should add to the distribution please use the feature-request/bug-tracking mechanisms on SourceForge.
An administrator can add further languages (i.e. "aa", "bb" and "cc") by placing a section
[*/stc] /languages/mostly-ascii7=de en fr nl be sp pt pl cs hu sv no da aa bb cc
in the system-wide preferences file ${prefix}/etc/appdefaults.
A user can add languages by placing a section
[stc] /languages/mostly-ascii7=de en fr nl be sp pt pl cs hu sv no da aa bb cc
in the user preferences file ${HOME}/.defaults/all.
When running stc, use the command line option
--/languages/mostly-ascii7="de en fr nl be sp pt pl cs hu sv no da aa bb cc"
to add languages.
When editing string table source files you should enter either
only 8-bit ASCII encoded strings or only UTF-8 encoded strings. Do
not mix encodings!
For new projects use only UTF-8 encoded strings.
Before editing string table files make sure whether your text
editor saves files UTF-8 encoded or non-UTF-8 encoded. Do not
simply rely on the language/region/encoding settings of the system.
Some "intelligent" editors switch back to non-UTF-8 encoded texts
if they think it might be useful. Examples: On my
Fedora Core 3 set up for "de_DE.UTF-8" vim converts files
containing 8-bit ASCII encoded text to UTF-8 when opening and
converts back to 8-bit ASCII when saving. So the files are saved
8-bit ASCII encoded. For new files or files containing only 7-bit
ASCII no conversion and backward conversion is done. These files
are saved UTF-8 encoded. On the same system nedit prints a message
"...switching from de_DE.UTF-8 to de_DE" when run from a terminal
window. Users running nedit from the menu will never see this
message unless they inspect system log files. Files are saved
non-UTF-8 encoded.
To test your editor, create a small text file, enter a short text,
i.e. "xöx" (instead of the german umlaut ö you can use
any other character in the UNICODE range 0x00000080...0x000000FF).
Save the file and open it in a hex viewer or hex editor. If only
one byte is used for the special character between the "x" your
editor saves files non-UTF-8 encoded. If multiple bytes are used
for the special character, it saves UTF-8 encoded files.
An example file containing UTF-8 encoded texts might look like
this:
$version 1.1 $author Dirk Krause $comment A new demo string table to show the structure "/msg/h" en.utf-8 = "Hello, nice wheather today!" de.utf-8 = "Hallo, schönes Wetter heute!" fr.utf-8 = "Bonjour!" "/msg/g" en.utf-8 = "Goodbye!" de.utf-8 = "Mach's gut!" fr.utf-8 = "Au revoir!"
When stc is run the missing texts for "en", "de" and "fr" are
created on the fly.
If the text contains characters outside the UNICODE range
0x00000000...0x000000FF there is no way to convert the character to
8-bit ASCII encoding. A dot is inserted into the text instead of
the special character.