Checking That It's All Translatable

When building a translated application, it's important to test that all of the text is going to be translated, but difficult to tell until the translation has been done. Until then, even when you switch languages you still see English everywhere. It's not until all the text that's been set up to be translated actually is that you can see the site in the other language, at which point the English messages stick out like a sore thumb. But that's usually very late in the process. How can we catch those errors earlier?

One trick that works surprisingly well is to do a "fake" translation. If you can programmatically modify your English text in a recognizable way and pretend that's your actual translation, when you run the site you can see the messages that have not been modified, and know they need to be marked for translation.

(I didn't come up with this idea, but saw it used in a previous job to great effect.)

Here's how I'm doing this for Django.

  1. Process the messages in the application to a .po file:

    python makemessages  -l en
  2. Run the fake translation tool, taking the English .po file as input and producing a .po file for another language as output:

    python locale/en/LC_MESSAGES/django.po locale/ar/LC_MESSAGES/django.po
  3. Compile the new .po file:

    python compilemessages
  4. Run the site, switch to the "translated" language and look for untranslated strings:

    python runserver

We've skipped over the fake translation tool, so now let's see how we can build that. The strategy will be to read the English .po file, go through the messages and "translate" each one, then write a new .po file.:

#!/usr/bin/env python
# -*- python -*-

# Usage: python inputfile.po outputfile.po

import sys

import polib

def translate(s):

po = polib.pofile(sys.argv[1])
for entry in po:
    if entry.msgid_plural:
        entry.msgstr_plural = {
            0: translate(entry.msgid),
            1: translate(entry.msgid_plural),
    elif entry.msgid:
        entry.msgstr = translate(entry.msgid)[2])

There's one Python package it depends on, polib, which handles reading and writing the .po files for us.

I won't go into the reason for the special handling of the plural case. If you're curious, you can read all about how .po files work here.

It just remains to decide how we will "translate" the messages. We want the messages to still be readable, but to mark them somehow as "translated". So this will almost do it:

def translate(s):
    return "**%s**" % s

This just puts "**" at the beginning and end of each message, which makes it easy to spot any text in the site that hasn't been "translated".

The only problem occurs if any messages have leading whitespace. compilemessages is smart enough to make sure that the translated messages still start with the same whitespace (to catch translation errors, presumably), and if we put "**" in front of the whitespace, it raises a fatal error. To preserve any leading whitespace, we end up with:

def translate(s):
    if s[0].isspace():
        return s[0] + translate(s[1:])
    return "**%s**" % s
New Call-to-action
blog comments powered by Disqus



You're already subscribed