[2024-feb-29] Sad news: Eric Layton aka Nocturnal Slacker aka vtel57 passed away on Feb 26th, shortly after hospitalization. He was one of our Wiki's most prominent admins. He will be missed.

Welcome to the Slackware Documentation Project

Dit is een oude revisie van het document!


Internationalisatie and localisatie van shell scripts

Presentatie

Doel, reikwijdte en doelgroep

Dit document is bedoeld om ontwikkelaars, maintainers en vertalers te helpen bij het schrijven, onderhouden en vertalen van ge-internationaliseerde en dan gelocaliseerde shell scripts, met gebruikmaking van de gereedschappen die door GNU gettext worden geleverd.

Het referentiedocument is de handleiding getiteld GNU `gettext' utilities.

De handleiding omvat alle programmeertalen die bruikbaar zijn met gettext, met een speciale focus op de taal 'C'.

De POSIX specificatie is aanbevolen leesvoer, met name de volumes Basis Definities en Shell en Gereedschappen.

In contrast met de handleiding is de reikwijdte van het onderhavige document beperkt tot shell scripts.

Theorie van de operatie

Het doel is om berichten (gewoonlijk stukken tekst) die de uitvoer zijn van shell scripts, weer te geven op het systeem van de gebruiker in diens voorkeurstaal.

De gebruiker geeft zijn/haar voorkeur aan door het instellen van de LANG of de LANGUAGE omgevingsvariabele (de laatste bevat een prioriteitenlijst van talen die voor het weergeven van berichten gebruikt mogen worden).

Het Internationalisatie proces (verkort tot I18N) bestaat uit:

  • de teksten markeren in de shell scripts die de te vertalen uitvoer-berichten vormen,
  • vervolgende de gettext gereedschappen inzetten om uit deze verzameling gemarkeerde scripts een berichten sjabloon-catalogus te vormen.

Een berichten sjabloon-catalogus wordt gewoonlijk een “Portable Object Template” of POT bestand genoemd.

Een POT bestand, leesbare platte tekst, bestaat voornamelijk uit de geëxtraheerde tekst-reeksen, vooraf gegaan door “msgid” wat zoveel betekent als “message identifier”, waatbij elk weer wordt gevolgd door een vertaling van dat bericht, dat wordt vooraf gegaan door de tekst “msgstr”.

Het Localisatie proces (verkort tot L10N) bestaat uit:

  • het genereren van aparte “Portable Object” of PO bestanden voor iedere doel-taal uit het enkele POT bestand,
  • alle “msgstr” teksten voorzien van vertalingen in ieder PO bestand,
  • deze PO bestanden controleren/verifiëren na de vertaalslag,
  • ieder PO bestand individueel compileren tot een “Machine Objet” of MO bestand.

De MO bestanden die bedoeld zijn voor de machine, niet de mens, vandaar de naam, worden traditioneel opgeslagen als:

/usr/share/locale/<locale>/LC_MESSAGES/<software naam>.mo

In het bovenstaande pad is <locale> een locale code in de vorm <ll[_TT], waar ll de twee-letterige code is van de doel-taal zoals gedefinieerd in de ISO 639-1 standaard, en TT (indien aanwezig) is de twee-letterige land-code van deze locale zoals gedefinieerd in ISO 3166.

Ieder gemarkeerd script moet het volgende commando bevatten:

export TEXTDOMAIN=<software naam>

Tijdens het uitvoeren van het script stelt dat 'gettext' in staat om het juiste MO bestand te vinden en iedere gemarkeerde bericht-tekst weer te geven in de voorkeurstaal zoals die door de LANG of LANGUAGE omgevings-variabele is bepaald.

Procesdiagrammen

Let's assume that a given software comprises a set of shell scripts that we want to internationalize and localize.

Following diagrams give an overview of each of the involved processes: internationalization, localization, usage and maintenance.

These diagrams are hybrid, i.e. they exhibit data as well as actions.

Among these actions are execution of some programs of the gettext suite:

  • gettetxt: marks strings to be insternationalized, then displays localized messages during scripts' execution
  • xgettext: extracts marked strings from a set of shell scripts to build a POT or a PO file
  • msgcmp: checks a PO file against another PO or a POT file for consistency
  • msginit: write a PO file using a POT file as its input
  • msgfmt: format a MO file using a POI file as its input
  • msgmerge: merge or update PO or POT files

In below diagrams gettext programs are surrounded by square brackets.

(1) Internationalization

 Set of shell scripts ───> Preparation ───> Marked shell scripts ───╮
                                                                    │
          ╭──────────────<── software.pot <─── [xgettext] <─────────╯
          │
          ├────> (2) Localization
          │
          ╰────> (4) Maintenance

(2) Localization (example for French and Dutch languages).

     ╭──────────────<── software.pot <── (1) Internationalization
     │
     │
     ├──> [msginit] ──> fr_FR.po ──> PO editor ──> fr_FR.po ──> [msgcmp] ──╮
     │                                                                     │
     │   ╭─ installation <── fr_FR.mo <── [msgfmt] <─┬─ fr_FR.po checked <─╯
     │   │                                           │
     │   │                                           ╰────> (4) Maintenance
     │   │
     │   ╰──> /usr/share/locale/fr_FR/LC_MESSAGES/software.mo ─> (3) Usage
     │
     ╰──> [msginit] ──> nl_NL.po ──> PO editor ──> nl_NL.po ──> [msgcmp] ──╮
                                                                           │
         ╭─ installation <── nl_NL.mo <── [msgfmt] <─┬─ nl_NL.po checked <─╯
         │                                           │
         │                                           ╰────> (4) Maintenance
         │
         ╰───> /usr/share/locale/nl_NL/LC_MESSAGES/software.mo ─> (3) Usage

(3) Usage

Let's assume that one of the scripts, “myscript.sh” includes following command:

 gettext "Good morning"

and that “Good morning” is translated as follows in the message catalogs:

 /usr/share/locale/fr_FR/LC_MESSAGES/PACKAGE.mo ─> "Bonjour"
 /usr/share/locale/nl/LC_MESSAGES/PACKAGE.mo ─> "Goedemorgen"

Here is what user will see depending on LANG setting:

            ╭──────────────<── (2) Localization
            │
 LANG=fr_FR ├───> sh myscript.sh or ./myscript.sh ───> "Bonjour"
            │
 LANG=nl_NL ╰───> sh myscript.sh or ./myscript.sh ───> "Goedemorgen"
         

(4) Maintenance

Maintenance process can be triggered by a script's creation, modification or deletion.

In the diagram below, the part of the process beginning with the msmerge command should be repeated for each available PO file.

It is therefore advisable to keep an up to date list of available translations in the form of PO files.

      Shell scripts updated and marked ───> [xgettext] ───> software.pot ──╮
                                                                           │
                                   (1) Localization ──> <locale>.po ───>┬<─╯
                                                                        │
╭─ [msgcmp] <── <locale>.po <─ PO editor <── <locale>.po <─ [msmerge] <─╯
│
╰──> <locale>.po checked ─> [msgfmt] ─> <locale>.mo ─> installation  ────╮
                                                                         │
                  /usr/share/locale/<locale>/LC_MESSAGES/software.mo <───╯
                   
                          

Maintenance process can be triggered as well by a modification of a messages catalog for a specific language (to correct an error for instance).

This variant of the process is shorter:

 
╭─ [msgcmp] <── <locale>.po <── PO editor <── <locale>.po <── Update needed
│
╰─> <locale>.po checked ──> [msgfmt] ──> <locale>.mo ──> installation  ──╮
                                                                         │
                   /usr/share/locale/<locale>/LC_MESSAGES/software.mo <──╯

Internationalization process

This chapter is intended for developers and maintainers.

The internationalization process comprises following tasks:

  1. Prepare scripts for internationalization
  2. Mark messages to be localized
  3. Use 'xgettext' to produce a template catalog of messages

Prepare scripts for internationalization

This task is needed for shell scripts that do not yet fulfill requirements for internationalization.

Technical note: Gettext's requirements for shell scripts.

The list of requirements below is not complete.

It includes only the main ones that I recommend the developer or maintainer to check, based on my experience.

Gettext replaces at run time text strings output of:

  • an “echo” command or
  • a program (like 'dialog', for instance)

with translated text strings (found in a messages catalog for the language set by $LANG or $LANGUAGE)

But the replacement only occurs if following conditions are fulfilled:

  • A MO file is available in the path computed from the TEXTDOMAIN environment variable as <dir_name>/<locale>/LC_MESSAGES/text_domain.mo.
    For instance, if TEXTDOMAIN=software and $LANG=de_DE.utf8, gettext will look for: <dir_name>/de_DE/LC_MESSAGES/software.mo
    <dir_name> can be set through the value of the TEXTDOMAINDIR environment variable, otherwise a default value is used.
    In Slackware Linux for instance, the default value is /usr/share/locale.
    There are fall backs, for instance if <locale> is “de_DE” the mo file could be placed in <dir_name>/de/LC_MESSAGES/ instead of <dir_name>/de_DE/LC_MESSAGES/
  • TEXTDOMAIN variable is exported before any *gettext command occurs.
  • gettext.sh, which provides the eval_gettext and eval_ngettext functions, is sourced before any occurrence of one of these functions.
  • A msgid string in the MO file matches exactly the argument of gettext (or eval_gettext if the text string includes a parameter expansion).
  • The corresponding msgstr string does not include a backslash followed by a white space.
  • The msgstr string begins and ends with a newline or not, as the msgid does.
  • If the text string includes a parameter expansion, eval_gettext is used instead of gettext.
  • “The variable names must consist solely of alphanumeric or underscore ASCII characters, not start with a digit and be nonempty; otherwise such a variable reference is ignored.” (gettext manual)
  • Parameter expansions are escaped with a single backslash like this:
    \$parameter or \${parameter}
    unless the eval_gettext command be inside a command substitution like this:
    “`eval_gettext ”…“`” or “$(eval_gettext ”…“)“
    In the latter case, three backslashes are needed like this:
    \\\$parameter or \\\${parameter}.
  • Only the forms $parameter and ${parameter} of parameter expansion are used inside an eval_gettext's argument (all other ones are forbidden).
  • Positional parameters, special parameters and command substitutions are *not* used inside a gettext's or eval_gettext's argument.

As a practical consequence of the two last rules, it is advisable that all positional parameters, special parameters, command substitutions and not allowed forms of parameter substitutions be assigned upstream to named variables, then expanded in the text string argument of eval_gettext or eval_negettext.

Tip: if a text string has been included as a msgid in a catalog of messages and is assigned to a named variable in a script, then the commands: “gettext $parameter” and “gettext ${parameter}” will output the translated string at run time, even though 'xgettext' would discard that command when parsing the script, because 'gettext' is used instead of 'eval_gettext'. This can be handy. In this case the parameter expansion should not be escaped.

Mark messages to be localized

I recommend to mark messages:

  • arguments of a not redirected 'echo' command
  • arguments of redirected 'echo' commands whenever a further processing displays it on user's screen
  • arguments of other commands which displays the message, for instance the 'dialog' program

On the contrary I recommend not to mark:

  • comments intended for readers of the script,
  • text string whose value will be processed later, for instance as arguments of a 'case' compound command, or <tag> arguments of a dialog –menu' command.

Sometimes the shell script writes other shell scripts.

Then the developer or maintainer have to decide on a case by case basis what to mark depending on the intended scope of internationalization.

Use 'xgettext' to produce a template catalog of messages

The choice to produce only one POT file for the software as a whole or to make one POT files per set of scripts have to be made, considering for instance which choice will minimize maintenance work, how localizations work can be organized, relative frequency of updates for the different sets of scripts which comprise the software, and the relevance of distinguishing groups of features like setup vs configuration vs package management.

I'm inclined to produce only one POT file, but the choice is yours.

If the software comprises of numerous scripts located in different places or included in several packages, it can be handy to collect a copy of all scripts in a single directory, and/or to register in a text file a list of all of them with their paths.

The POT file will be generated using the 'xgettext' command (see the manual or 'xgettext –help' for details).

Include following options in the command:

-L Shell (of course!)
--strict (to facilitate checks and management of the messages catalogs)
-c       (to include comments useful for the translators in the POT file)
-n       (to identify the source file and the line number of each message.
         This is the default.)

Once the POT file is generated you could check that it includes entries for all *gettext invocation in shell script(s).

Localization process

Once the POT file is available, the 'msginit' command writes a PO file for each target language.

In PO files the “msgid” strings should never be modified, otherwise the translation won't occur at run time.

The 'msgcmp' command allow to checks each PO file against the POT after translation, to make sure all messages are translated.

The translator can use the 'msgfmt' command to check the layout of the translated text.

The PO file should be carefully saved somewhere, as it will be needed for subsequent maintenance (it is still possible to 'msgunfmt' a MO file to re-create a PO file but then you would loose the context, which would make it almost useless).

The checked PO file is handed over to the maintainer, who runs 'msgfmt' to produce the MO file, then installs it.

Usage process

The only thing the user will have to take care of is set up his preferred language(s).

The primary way to do that is setting the LANG environment variable.

This can be done at run time, preceding the command used to run the script with LANG=<locale>, but usually the user will set it up permanently.

For instance in Slackware Linux this will be done in editing the file(s) /etc/profile.d/lang.sh and/or /etc/profile.d/lang.csh (see these files).

The changes will be effective at next reboot.

I suggest to use an UTF-8 locale, as for reading this document.

If the user is polyglot, another option is to set gettext's specific LANGUAGE environment variable to specify a prioritized list of languages.

For instance, if LANGUAGE is set to 'de:fr' then a Deutsch translation will be used if available, else a French translation will be used if available, else messages will be displayed in the original language, usually English. See gettext's manual for details.

Maintenance process

In most cases the maintenance process will be triggered by a script's creation, modification or deletion.

In such a case the maintainer will generate a new POT file with 'xgettext' then hand it hover to the translators.

The translators will use the new POT file to update their respective (saved) PO files with the 'msgmerge –update' command.

Then they will edit/complete the translations, focusing on the not yet translated messages and on those marked as “fuzzy” in the PO files, using a PO editor.

After that the PO file will be checked against the POT file with 'msgcmp', carefully saved, handed over to the maintainer who will generate the new MO file with 'msgfmt' and install it as in the initial localization process.

The maintenance process triggered by a needed modification of a PO file for a specific language is similar, only shorter: it will begin with the update of the relevant PO file by the translator. To minimize the workload caused by this type of maintenance, I suggest that the maintainer demand that he or she be provided only with complete and well reviewed translations.

Practical recommendations for developers and maintainers

Many English words are polysemous: their meaning can only be determined from the context of their usage. As a practical consequence, the more context you provide, the more accurate the translation can be.

Example: recently, while downloading a software I saw something like this:
31min gauche
Go figure? After a while I realized that “left” had been translated “gauche” (as in “left hand”).

Also, order of words in a sentence vary upon language, furthermore not all languages are written left to right. Thus, mark entire paragraphs, or at least entire sentences, not lines, let alone isolated words but in special cases.

For instance, if text paragraphs were split in lines displayed by 'echo' commands, replace all consecutive 'echo' commands by a single 'gettext' or 'eval_gettext' command.

Do not fear to include the variable substitutions in the sentences, PO editor will check that they be present as is in the translations.

Recommendations for 'dialog' program.

The 'dialog' program provides an UI taking the form of dialog boxes.

There are other programs with similar feature, to which I guess (only a guess), these recommendations are also applicable.

Bear in mind following considerations, when making or reviewing the design choices for dialog's boxes.

  • Messages translated in other languages will often be significantly longer than the original (usually in English) ones.
  • In situations where only VGA drivers are available (e.g. in text installers) screen display is generally restricted to 25 rows of 80 columns with most widely used fonts, but in practice word wrapping can occur if line's length is more than 74 characters.
    As a consequence, for static layouts text lines' length should be at most 74 characters.
  • Vertical scrolling of text is widely accepted as frequently used to display web pages, and sometimes unavoidable.
    On the contrary, horizontal scrolling should be avoided as much as possible.

Therefore I suggest to:

  • renounce to tightly adjust the dimensions of the boxes to the size of English text as the translation will probably break your carefully crafted layout, unless you impose unreasonable (IMO) constraints to the translators,
  • in particular, not narrow boxes' width to what is strictly needed for displaying English texts, especially in tabular layouts where the text can't flow on next lines,
  • favor a fluid layout of the displayed text over a fixed one to avoid too long lines in translations, whose complete display would then necessitate horizontal scrolling (which, moreover, is not always possible).

In particular, I recommend to favor options which take as first argument a text string instead of a file, to allow line wrapping. It is still possible to preserve the intended layout using white spaces for indentation.

For instance,
dialog <common-options> –textbox <file> <height> <width>
can be replaced with
dialog –no-collapse <common-options> –msgbox “`cat <file>`” <height> <width>

Practical recommendations for translators

Depending on amount of work needed and available resources, there can be one translator or a team of translators per target language. In all cases, I recommend that at least one person be responsible for organizing the team's work, checking the translations and transmitting the checked PO file to the maintainer(s). Let's call this person the team coordinator.

Don't feel obliged to translate verbatim. Not only is this rarely the best way to convey the meaning, but in addition this often leads to sentences too long to fit in allowed space.

Use a specialized PO editor, 'not' a general text editor. This will not only prevent inadvertently editing 'msgid' strings but also facilitate their work and automatize additional checks, as the presence of a variable in the translation with the same spelling as in the original.

While translating, choose a serif fixed width (or “monospaced”) font, like Courier. That allow to visually distinguish characters that otherwise would look the same, and check line's length when that matters.

If possible, check the layout of the messages. You could do that looking at the context in the relevant source file. Even better, simply run the translated script.

This is especially important if you are translating dialog boxes. In particular, take care not to write too long sentences on one single line if it appears that the text can't flow on next one.

Bear in mind that in VGA mode (used in text installers, in particular), line's width is limited theoretically to 80 characters, but practically often to 74.

Do not add question marks that are not present in the original message.

If the message refers to tags (text on the buttons) of dialog boxes, like “OK”, “Yes”, “NO”, “Continue”, “Cancel”, check how theses tags are translated in your language in dialog's interface and use the same words.

Avoid colloquialisms and technical slang.

To “cut” (or end) a line inside a “dialog” box you should type \n: pressing [Enter] will 'not' insert a “new line” character in the text viewed by user.

In addition, you will have to comply to gettext's requirements for it to work:

  • If a word beginning with a dollar sign is included in the original text it should be present in the translation with exactly the same spelling (case matters).
  • The translation text should include a “new line” character (or line feed, represented by “\n”) at the beginning or at the end, exactly as the original text does. Conversely, if the original text doesn’t have the character, then the translation shouldn’t have it.
  • A single backslash character “\” is not allowed in the translation.

To “cut” (or end) a line inside a “dialog” box you should type \n: pressing [Enter] will 'not' insert a “new line” character in the text viewed by user.

To check your translation against gettext's requirements you could run following command:

msgfmt -c <name of the PO file>

Warning about translation of man pages

Preserve carefully syntax of man pages found in English markup. For instance don't replace:

  • 'B<' with 'B <' (don't insert a space)
  • 'B<' with 'b<' (keep the B as a capital letter - and don't replace it by the Greek capital letter BETA that looks the same on the screen)
  • “I” with '|' (don't replace the capital letter I with a pipe symbol)

When translating shell commands, preserve English names of paths when needed. But you may and should translate arguments to be replaced by a value like 'packagename'

Didier Spaier

Bronnen

* Origineel geschreven door Didier Spaier * Vertaling door Eric Hameleers

Afdrukken/exporteren
 nl:howtos:misc:internationalization_and_localization_of_shell_scripts ()
Deze vertaling is ouder dan de originele pagina en kan verouderd zijn. Kijk wat er is veranderd.