The Translate Toolkit might have changed how it functions in certain cases. This page lists what has changed, how it might affect you and how to work around the change either to bring your files in line or to use the old behaviour if required.
This section describes the changes in the (unreleased) development files. It should reflect upcoming changes in the next version of the toolkit.
From version 1.0.1, pot2po will create empty PO files corresponding to new POT files that might have been introduced. If some new POT files are present in the input to pot2po, you will see a new PO file appear in your output directory that was not in your templates (old PO files).
Many toolkit tools that only worked with PO files before, can now also work with XLIFF files. pogrep, pocount, pomerge, and pofilter all work with XLIFF, for example.
All XML formats should now be more human readable, and the converters to Qt .ts files should work correctly again.
Fuzzy matching can now be entirely disabled in pot2po with the --nofuzzymatching parameter. This should make it much faster, although pot2po is substantially faster than earlier versions, especially if python-Levenshtein is installed.
The toolkits output PO format should now resemble Gettext PO files more closely. Long lines are wrapped correctly, messages with long initial lines will start with a ‘msgid ““’ entry. The reason for this change is to ensure that differences in files relate to content change not format change, no matter what tool you use.
To understand the problem more clearly. If a user creates POT files with e.g. oo2po. She then edits them in a PO editor or manipulate them with the Gettext tools. The layout of the file after manipulation was often different from the original produced by the Toolkit. Thus making it hard to tell what where content changes as apposed to layout changes.
The changes will affect you as follows:
Our suggestion is that if you are about to suffer a major reflow that your initial merge contain only reflow and update changes. Do content changes in subsequent steps. Once you have gone through the reflow you should see no layout changes and only content changes.
The toolkit is gradually becoming more aware of the differences between languages. Currently this mostly affects pofilter checks (and therefore also Pootle) where tests involving punctuation and capitalisation will be more aware of the differences between English and some other languages. Provisional customisation for the following languages are in place and we will welcome more work on the language module: Amharic, Arabic, Greek, Persian, French, Armenian, Japanese, Khmer, Vietnamese, all types of Chinese.
The escapes test has been refined with two new tests, newlines
and tabs
. This makes identifying the errors easier and makes it easier to control the results of the tests. You shouldn’t have to change your testing beahviour in any way.
pomerge now handles fuzzy states:
pomerge -t old -i merge -o new
Messages that are fuzzy in merge will now also be fuzzy in new. Similarly if a fuzzy state is present in old but removed in merge then the message in new will not be fuzzy.
Previously no fuzzy states were changed during a merge.
If you use pofilter with the --mozilla
option then accelerator failures will produce a serious filter error, i.e. the message will be marked as fuzzy
. This has been done because accelerator problems in your translations have the potential to break Mozilla applications.
We have added the --personality option to allow a user to select output in either java
, or mozilla
style (Java property files use escaped Unicode, while Mozilla uses actual Unicode characters). This functionality was always available but was not exposed to the user and we always defaulted to the Mozilla style.
When using po2moz the behaviour is not changed for the user as the programs will ensure that the properties convertor uses Mozilla style.
However, when using po2prop the default style is now java
, thus if you are converting a single .properties
file as part of a Mozilla conversion you will need to add --personality=mozilla
to your conversion. Thus
po2prop -t moz.properties moz.properties.po my-moz.properties
Would become
po2prop --personality=mozilla -t moz.properties moz.properties.po my-moz.properties
Note: output in java style escaped Unicode will still be useable by Mozilla but will be harder to read.
There is some initial support for reading from and writing to compressed files. Single files compressed with gzip or bzip2 compression is supported, but not tarballs. Most tools don’t support it, but pocount and the --tm parameter to pot2po will work with it, for example. Naturally it is slower than working with uncompressed files. Hopefully more tools can support it in future.
In po2oo we made the default filter action = none ie do nothing and don’t warn. Until we have a way of clearly marking false positives we’ll have to disable this functionality. Also renamed exclude to exclude-all so that it is clearer what it does i.e. it excludes ‘all’ vs excludes ‘serious’.
In the xmltags check we handle the case where we had some false positives. E.g. “<Error>” which looks like XML/HTML but should actually be translated. These are handled by
Conversion from PO to XLIFF is greatly improved in 0.10 and this was done according to the specification at http://xliff-tools.freedesktop.org/wiki/Projects/XliffPoGuide - please let us know if there are features lacking.
pot2po has undergone major changes which means that it now respects your header entries, can resurect obselete messages, does fuzzy matching using Levenshtein distance algorithm, will correctly match messages with KDE style comments and can use an external Translation Memory. You can now use pot2po instead of Gettext’s msgmerge and it can also replace pomigrate2. You may still want to use pomigrate2 if there where file movements between versions as pot2po can still not do inteligent matching of PO and POT files, pomigrate2 has also been adapted so that it can use pot2po as it background merging tool.
pomigrate2 --use-compenidum --pot2po <old> <pot> <new>
This will migrate file with a compendium built from PO files in <old> and will use pot2po as its conversion engine.
When using templates for generating translated .properties files we will now preserve the formating around the equal sign.
# Previously if the template had property = value
# We output property=translation
# We will now output property = translation
This change ensures that there is less noise when checking differences against the template file. However, there will be quite a bit of noise when you make your first .properties commits with the new pretty layout. Our suggestion is that you make a single commit of .properties files without changes of translations to gt the formating correct.
Previously each converter handled escaping, which made it a nightmare everytime we identified an escaping related error or added a new format. Escaping has now been moved into the format classes as much as possible, the result being that formats exchange Python strings and manage their own escaping.
I doing this migration we revisted some of the format migration. We found that we were escaping elements in our output DTD files. DTD‘s should have no escaping ie \n is a literal ‘\’ followed by an ‘n’ not a newline.
A result of this change is that older PO files will have different escaping to what po2moz will now expect. Probably resulting in bad output .dtd files.
We did not make this backward compatable as the fix is relatilvey simple and is one you would have done for any migration of your PO files.
# 1. Create a new set of POT files moz2po -P mozilla pot # 2. Migrate your old PO files pomigrate2 old new pot # 3a. Fix all the fuzzy translations by editing your PO files # 3b. Use pofilter to check for escaping problems and fix them pofilter -t ecapes new new-check # Edit file in new-check in your PO editor pomerge -t new -i new-check -o new-check
All filters are/have been migrate to a base class. This move is so that it is easier to add new format, interchange formats and to create convertors. Thus xx2po and xx2xlf become easier to create. Also adding a new format should be as simple as working towards the API exposed in the base class. An unexpected side effect will be the Pootle should be able to work directly with any base class file (although that will nto be the normal Pootle operation)
We have checks in place to ensure the the current operation remains correct. However, nothing is perfect and unfortunately the only way to really expose all bugs is to release this software.
If you discover a bug please report it on Bugzilla or on the Pootle mailing list. If you have the skills please check on HEAD to see if it is not already fixed and if you regard it as critical discuss on the mailing list backporting the fix (note some fixes will not be backported becuase they may be too invasive for the stable branch). If you are a developer please write a test to expose the bug and a fix if possible.
We added the --duplicatestyle option to allow duplicate messages to be merged, commented or simply appear in the PO unmerged. Initialy we used the msgid_comments options as the default. This adds a KDE style comment to all affected messages which created a good balance allowing users to see duplicates in the PO file but still create a valid PO file.
‘msgid_comments’ was the default for 0.8 ( check), however it seemed to create more confusion then it solved. Thus we have reverted to using ‘merge’ as the default (this then completely mimicks Gettext behaviour).
As Gettext will soon introduce the msgctxt attribute we may revert to using that to manage disambiguation messages instead of KDE comments. This we feel will put us back at a good balance of usefullness and usability. We will only release this when msgctxt version of the Gettext tools are released.
The main use of the .properties convertor class is to translate Mozilla files, although .properties files are actually a Java standard. The old Mozilla way, and still the Java way, of working with .properties files is to escape any Unicode characters using the \uNNNN convention. Mozilla now allows you to use Unicode in UTF-8 encoding for these files. Thus in 0.9 of the Toolkit we now output UTF-8 encoded properties files. Bug #114 tracks the status of this and we hope to add a feature to prop2po to restore the correct Java convention as an option.