comment
Facebook's AI model is a leap forward in automated translation
GlobalData thematic research analyst Sarah Coop explores the potential of the social media giant’s new multilingual machine learning translation software.
Facebook has launched a multilingual machine learning translation model that challenges the status quo in its field. While previous translation models generally relied on English data as an intermediary, Facebook’s many-to-many software, called M2M-100, can translate directly between any pair of 100 languages.
The software is open-source with the model, raw data, training, and evaluation setup available on GitHub.
M2M-100, if it works correctly, provides a functional product with many real-world applications, which can be built on by other developers. In a globalised world, accurate translation of a wide variety of languages is vital. It enables better communication between communities, which is essential for multinational businesses. It also allows news articles and social media posts to be translated correctly, reducing instances of misinformation.
The potential and pitfalls of automated translation
Facebook’s M2M-100 provides 2,200 translation combinations of 100 languages without relying on English data as a mediator. Among its main competitors, Amazon Translate and Microsoft Translator both support significantly fewer languages than Facebook. However, Google Translate supports 108 languages, both dead and alive, having added five new languages in February 2020.
Google and Facebook’s products have other differences. Google uses BookCorpus and English Wikipedia as training data, whereas Facebook analyses the language of its users. Facebook is, therefore, more suitable for conversational translation, while Google excels at academic-style web page translation.
Google performs best when English is the target language, which correlates to the training data used. Facebook’s multi-directional model claims there is no English bias, with translations functioning between 2,200 language pairs. Accurate conversational translations based on real-time data and multiple language pairs has the potential to fulfil global business needs, making Facebook a market leader.
However, natural language processing can be problematic, with semantics making it hard for algorithms to provide accurate translations. In 2017, Facebook translated the phrase “good morning” in Arabic, posted on its platform by a Palestinian user, as “attack them” in Hebrew, resulting in the poster’s arrest by Israeli police. The open-source nature of the software will help developers recognise pain points. It also allows innovation, enabling multilingual models to be advanced in the future by developers.
Cities that otherwise would not be thought of as ‘tech hubs’ are becoming home to multinational giants that foster talent.
AI needs solid results to live up to the hype
GlobalData’s research suggests that years of bold proclamations by tech companies eager for publicity have resulted in artificial intelligence (AI) being considered overhyped. The reality has often fallen short of the rhetoric.
Principal Microsoft researcher Katja Hofmann argues that AI is transitioning to a new phase, in which breakthroughs occur but at a slower rate than previously suggested. The next few years will require practical uses of AI with tangible benefits, applying AI to specific use cases.
Language translation is a high-profile use case for AI due to its applications in conversational platforms such as Amazon’s Alexa, Google’s Assistant, and Apple’s Siri. As the tech giants continue to compete for superiority in the performance of their virtual assistants, Facebook’s M2M-100 will raise the stakes in AI translation software.
In an interconnected, globalised world, accurate translation is essential. Facebook has used its global community and access to large datasets to progress machine learning in translation, creating a practical, real-world use case. Allowing open access to the training data and models will help to propel future developments forward, moving linguistic machine learning away from a traditionally Anglo-centric model.
Nabil Lodey is CEO of Envitia, a data software and services company based in Horsham whose clients include the Ministry of Defence and the UK Environment Agency.