Machine Translation: Game Changer or Marketing Hype?

Gaining Momentum

Machine Translation (MT) has seen promising advancements in the past few years. Particularly Neural MT (NMT) continues to show improved translation quality over the older Statistical MT (SMT) model.

Although several experiments have fueled optimism, measuring the output quality of MT is also based on the scoring model. In addition, there has been controversy about the reliability and usefulness of existing scoring models.

In general, human translation services have always been haunted by the pressures of time, cost, and quality. While technology solutions have helped improve the delivery of human translations, fully replacing human abilities remains a challenge. More specifically, achieving high translation quality still relies on human translators despite the headway in MT.

Generally speaking, though, MT is a trending technology in the wake of Artificial Intelligence (AI) and Natural Language Processing (NLP). Many Language Service Providers (LSPs) are integrating MT into their service delivery to lower cost and improve turnaround time. In addition, technology providers such as Memsource, Smartling, and XTM International leverage MT capabilities with their TMS solutions. And given the few standards dedicated to the localization industry, ISO 18587 further underscores the growing momentum of MT utilization.

This begs the question whether MT is a game changer or just marketing hype.

Evaluating MT Output Quality

The Challenge of Measuring Translation Quality

The potential cost and time improvements of MT are undeniable. Furthermore, not all translations require the same level of quality to be useful. However, most content that has a business application or is needed in a professional setting requires higher quality.

Therefore, closing the quality gap between MT output and human translations is the biggest obstacle to overcome. Nevertheless, solving this problem should not come at the expense of more human intervention. This would mean that we are simply implementing crutches to enable MT. It would also mean that we achieve MT effectiveness by adding effort through human pre-editing and post-editing activities. The overall net change could be negative if the additional effort outweighs the gains from MT.

Overall, translation quality has long been a fiercely debated topic. Depending on whether you are a solution provider or a translation buyer, your views of translation quality likely differ. More importantly, gauging translation quality has a subjective component, which is difficult to quantify through objective measures. The best way to describe the effect of this subjectivity is in the form of a simple question: Do I like the translation?

A solution provider might apply common industry metrics to quantify translation quality (by focusing on accuracy). In contrast, a translation buyer will often use biases to assess quality. In the business world, the translation buyer ultimately answers the question about the quality of a translation. And this can be a moving target if different translation buyers offer conflicting feedback for the same translation. A good example for such a scenario is the in-country review process involving client reviewers. Many LSPs could share stories of their struggle to establish a clear baseline for client terminology and style.

Popular Quality Models (Scoring Metrics)

To help facilitate the quality debate, the research community has developed several quantitative models for evaluating MT output quality. Following are some of the prevalent models (in alphabetical order):

  • BLEU (Bilingual Evaluation Understudy) metric
  • LEPOR (Length Penalty, Precision, n-gram Position difference Penalty and Recall) metric
  • METEOR (Metric for Evaluation of Translation with Explicit ORdering) metric
  • NIST (National Institute of Standards and Technology) metric
  • ROGUE (Recall-Oriented Understudy for Gisting Evaluation) metric
  • TER (Translation Error/Edit Rate) metric
  • WER (Word Error Rate) metric

Generally, the scoring method differs for each model and a direct comparison of scores may not be possible. Moreover, several practitioners have proposed variations of the listed models to better accommodate human subjectivity.

Achieving MT Output Quality

Regardless of the scoring model, MT output quality depends on several factors:

  • Source text
  • Subject matter
  • MT engine
  • Vocabulary
  • Target language

Source Text

The quality of the source text has always played an important role in achieving high translation quality. This applies to human translations as much as to MT. However, while human translators compensate for flawed source text, MT might not.

For example, the ability to recognize context, cultural nuances, conceptual complexity, register (e.g., formality), and identifying obvious mistakes in the train of thought are uniquely human traits. Because MT is susceptible to poor source text, pre-editing of source text might be required to achieve the desired output quality. Still, this also means that the added effort can disproportionately counter the potential time and cost savings from MT.

Subject Matter

Subject matter (knowledge domain) is another factor that determines the potential success of MT. That is, complex subject matters can affect semantics and inhibit the correct rendition into another language. In addition, subject matter impacts our interpretation of concepts, words, and sentences. Overall, more complex subject matters, including creative texts, routinely challenge today’s capabilities of MT.

MT Engine

An MT engine is the software that consists of the logic (language model) and algorithms that generate the MT output. And there are many MT engines available from different providers. Some are offered for free while others involve a fee. If there is a fee, it usually depends on the type of usage (personal/professional) and subscription plan.

Moreover, most MT engines use proprietary technology and algorithms, which affect the output quality and suitability for some applications. Plus, not all MT engines have the same capabilities or support the same languages. This means that users of MT often rely on multiple MT engines to meet their needs.

More recently, AI promises to help optimize MT engines to improve their effectiveness. Numerous providers of MT engines and third-party solution providers emphasize the use of AI as part of their offering. However, it is not always tangible to translation buyers how AI-enabled translation services benefit them. Although you can easily quantify cost and time improvements from MT, quantifying quality gains can be problematic.


Vocabulary is the fuel of an MT engine. Without that, it simply will not perform well. However, there is more to this. The quality and relevance of the vocabulary determines how effective an MT engine will be. Likewise, a large vocabulary is no guarantee for good output. The vocabulary must be both of high quality (optimized) and pertinent to the specific application.

In other words, the success of MT is also a matter of creating and maintaining viable vocabularies. And depending on user needs, this could involve multiple vocabularies. Good vocabularies establish a controlled linguistic baseline for different scopes of work and subject matters (knowledge domains).

In addition, AI-enabled MT has the potential to help automate the optimization of vocabularies through deep learning. This is particularly meaningful for large sets of vocabularies where manual optimization would be cumbersome and time consuming. At the same time, an increasingly large vocabulary can add too much complexity, which can affect the training and performance of the MT engine.

There is one other major point about vocabularies. Many companies require translation of proprietary and confidential content. Consequently, the vocabularies for such content usually remain the property of the company as the content owner. The broad use of these vocabularies in the public domain would likely infringe on copyrights and expose sensitive content. Thus, successful deployment of MT for professional use closely correlates with the available internal/external vocabularies.

Target Language

As indicated earlier, different MT engines might not offer support for all or the same languages. Therefore, some translations might still entirely rely on human translators. Or they involve multiple MT engines to achieve a consistent output quality for all required languages. Furthermore, if the available vocabularies for a particular language are insufficient (too small), using MT might not be an option regardless of MT engine.

MT and the Translation Process

One might think that MT fundamentally changes the translation process. Luckily, it really does not, which makes it easy to integrate into an existing workflow. Basically, MT is an optional process block that automates the pre-translation of source text. Like Translation Memories (TMs), MT helps expedite translation activities and reduces cost.

In addition, MT can fully substitute TMs if these are not available. Or MT and TMs can work together in a hybrid setup. The specific workflow setup depends on the type of content and whether content reuse is part of a company’s content strategy.

The following illustration depicts a hybrid translation process (example).

Hybrid Translation Process

A key advantage of MT over TMs is the flexibility to translate any content, new or reused, on the fly. Likewise, MT is indifferent towards content (e.g., technical vs. non-technical). In contrast, TMs do not offer any benefit if the content is brand new and no matching is possible. Therefore, TMs assume reuse in the source text—the more the better. Typically, this is the case for technical content where standardization and consistency are widespread practice.


Some people might argue that MT is a game changer given the recent advancements. However, we need to be careful not to let the hype overtake optimism. According to the Merriam-Webster dictionary, a game changer is:

“A newly introduced element or factor that changes an existing situation or activity in a significant way.”

This is still somewhat vague because the word “significant” leaves a lot of room for interpretation. Moreover, MT is not equally impressive for all content and applications. This inconsistent performance and reliability limits MT’s potential so far.

If it were a game changer, the localization industry and businesses would implement MT on a much broader scale. For now, the primary reasons for LSPs to utilize MT are cost and time improvements due to market demands. And given the current hype, MT also provides a good marketing message to attract clients and highlight innovation.

However, MT could eventually emerge as a game changer, particularly if advancements in AI and big data processing can provide the needed push. More importantly, it would allow MT to redefine translation delivery in a significant way for our personal and professional use. Then, this could also help close the quality gap.

Leave a comment