Localizing Slack – Several People Are Coding


Localization is so easy!

…said no one ever.

This week, we launched French, German, and Spanish localization in Slack. I’d like to share some of the lessons learned, tooling, and processes we put in place to complete this project and build localization into our ongoing workflow.

Handling Strings

The first step in localizing Slack was to prepare the strings in our codebase for localization. While mobile platforms provide clear frameworks for localization that keep strings separate from code, our web and desktop codebase’s strings were embedded in HTML templates and business logic. We needed to create frameworks for representing strings in each programming language we used, and then touch every string in the code to implement the new helpers. We referred to this as “wrapping strings”, since each string needed to be wrapped in a block or function call.

We chose the ICU MessageFormat syntax to represent strings. After comparing it to gettext, we appreciated the power of its select and plural blocks to deal with some of our more complex strings, as well as the flexibility and robust implementations for JavaScript and PHP. We created ICU helpers for our templating languages, Handlebars and Smarty. While ICU was more powerful than gettext, it was also not directly supported by any Translation Management System (TMS) we could find, which meant we needed to turn off most of the TMS’s built in parsing and validation of translations and do that ourselves.

We debated whether to use named keys for each string in the code (like Android’s strings.xml), or use the English string as the key (like iOS’s NSLocalizedString). We chose the latter to keep the code more readable, using a hash of the English as a key. This made string wrapping much simpler as well, but did mean that changing the English string even slightly would require a re-translation.

Here’s an example of a string before localization from our notifications e-mail template:

<p>
You have {if $activity.num_dms == 1}a new direct message{else}{$activity.num_dms|number_word} new direct messages{/if}.
</p>

And after:

<p>
{t
num_dms=$activity.num_dms
dms_number_word=$activity.num_dms|number_word_localized
}
You have {num_dms, plural, =1 {a new direct message} other {{dms_number_word} new direct messages}}.
{/t}
</p>

The {t} block serves two purposes

  1. We look for it in static analysis to extract the strings for upload to our Translation Management System (TMS)
  2. At runtime (or build time) it hashes the string, looks up its translation, and renders it using an ICU MessageFormat library

We needed to do this for about 20,000 strings across 2,000 files, which was a massive undertaking. We decided to ask the entire Web Engineering team to help implement these changes. The i18n team held “string jams” where we kicked off with a presentation on how to wrap strings (Keep complete sentences together! Provide context comments with examples for translators!) and sat with each Engineering team for a couple of days to review their code and answer questions. This served as an excellent training exercise to get everyone familiar with how to write localization-friendly code going forward. We invited attendees to a channel in Slack which proved invaluable for discussions, pull requests, and questions in the following months.

One of the most helpful things at this stage was pseudolocalization, which accents each character yielding a visibly different but still legible string. Since we didn’t have translations yet, enabling this mode allowed engineers and QA to make sure that every string in a given view was ready for localization and that the locale setting was being properly applied. We implemented this in each language, adding some smarts to keep things like html tags, placeholders, and emoji unmangled. We also added extra tilde (~) characters to each word to make them 35% longer, simulating longer words from other languages to identify inflexible UI elements.

Testing Slack with pseudolocalization

Release Process and Tooling

We set ambitious requirements for our launch: Slack should have a consistent voice in each language with high quality translations, localization should be built into the workflow of every team, and all new features should be translated at release. To address the first point, we hired a full-time translation team who wrote a glossary and style guide for each language and worked alongside contractors to translate all of the words. We developed a set of tools and processes in service of the other two points.

Linting

Slack has robust linting tools to enforce code style conventions and prevent common mistakes. We augmented these tools to validate ICU syntax of each source string (there can be a lot of curly braces to get right), and ensure the correct parameters are passed in. We created checks so that every user-visible HTML element’s strings are wrapped for localization. Coupled with training and code review, these tools help avoid localization regressions during development.

Release Process

Slack’s web codebase is updated and continuously deployed more than 100 times every day. We built additional tooling to ensure that new feature releases and copy changes won’t cause users to see English strings in an otherwise translated experience.

For feature releases, Slack uses a feature flag system, which allows us to get new features into the code early and keeps them disabled for most users with conditional blocks. Features can be enabled in the development environment only, or just for our own Slack team, or rolled out to a percentage of teams. We wrote code to identify the set of conditional statements that each string is inside of in order to determine whether that string would be visible to users in production.

This was particularly challenging as the move to React on the frontend carried with it a new syntax, so code to parse if statements had to be written for PHP, Smarty, JavaScript, Handlebars, and React / JSX. However, the benefits are significant. This tooling allows us to add a check so that a feature can’t be enabled for users until all the strings are translated. We also augmented this tooling to build a dashboard displaying all the strings related to a feature, along with their translation status. We created a Slack channel where engineers and PMs can post a link to the dashboard to help translators prioritize and communicate about progress.

Sometimes, we just need to make a small change to an English string, such as updating punctuation, capitalization, or using synonyms. For changes that don’t materially impact the meaning of the string, we wanted an easy way to keep the previous translation until the new string was translated. Because we use the hash of the English string in the code as a key, this isn’t trivial. We came up with a flag on our {t} blocks called “fallback_hash”, which let engineers specify the hash of a previous version of a string, and made sure all the tooling understood this option.

Additional Challenges

There were many other challenges we had to solve to meet our quality bar. Here’s a summary of some of the more interesting ones:

  • Emoji names are localized client side and stored in a canonical English form in the data model. But since learned behaviors are hard to change, you can always type them in English even if you switch locales
  • We updated the /remind command to be less sentence-based for non-English locales using interactive messages to select a schedule
  • Our search index was updated to apply stemming in each language
  • To keep a consistent experience across platforms, we chose to override the device locale on mobile and let Slack users choose their locale preference instead
  • We taught Slackbot to be keenly aware of which language to speak: the user locale when messaging a user, the team locale when posting into a channel
  • We started putting ICU custom date formats in sentences , but they were a huge burden for translators. Instead we compiled a list of supported date formats for each locale and used Moment.js to build a library, passing dates into sentences as strings
  • We built a helper for formatting comma separated lists with “and” and “or” cases for each locale
  • We built a helper for possessives, supporting language specific rules
  • Some English words like “they”, which are ambiguously singular or plural, can lead to strings which are impossible to translate in many languages. We created separate singular and plural strings, even though the English was the same in both cases
  • If the subject of a sentence can be “you” or someone else, the verb form changes in many languages, so we needed to use a separate sentence for the “you” case

Looking Forward

Localizing Slack has been a massive effort. It took almost exactly a year to complete, and nearly everyone in the company supported the effort in some way. Looking forward, we continue to evolve as an engineering organization. We want to ensure we can support people all over the world — in additional languages — to make their working lives simpler, more pleasant, and more productive.



Source link