Language translation is important to Facebook's mission of making the world more open and connected, enabling everyone to consume posts or videos in their preferred language — all at the highest possible accuracy and speed.
Today, the Facebook Artificial Intelligence Research (FAIR) team published research results using a novel convolutional neural network (CNN) approach for language translation that achieves state-of-the-art accuracy at nine times the speed of recurrent neural systems.1 Additionally, the FAIR sequence modeling toolkit (fairseq) source code and the trained systems are available under an open source license on GitHub so that other researchers can build custom models for translation, text summarization, and other tasks.
Why convolutional neural networks?
Originally developed by Yann LeCun decades ago, CNNs have been very successful in several machine learning fields, such as image processing. However, recurrent neural networks (RNNs) are the incumbent technology for text applications and have been the top choice for language translation because of their high accuracy.
Though RNNs have historically outperformed CNNs at language translation tasks, their design has an inherent limitation, which can be understood by looking at how they process information. Computers translate text by reading a sentence in one language and predicting a sequence of words in another language with the same meaning. RNNs operate in a strict left-to-right or right-to-left order, one word at a time. This is a less natural fit to the highly parallel GPU hardware that powers modern machine learning. The computation cannot be fully parallelized, because each word must wait until the network is done with the previous word. In comparison, CNNs can compute all elements simultaneously, taking full advantage of GPU parallelism. They therefore are computationally more efficient. Another advantage of CNNs is that information is processed hierarchically, which makes it easier to capture complex relationships in the data.
In previous research, CNNs applied to translation have not outperformed RNNs. Nevertheless, because of the architectural potential of CNNs, FAIR began research that has led to a translation model design showing strong performance of CNNs for translation. The greater computational efficiency of CNNs has the potential to scale translation and cover more of the world’s 6,500 languages.
State-of-the-art results at record speed
Our results demonstrate a new state-of-the-art compared with RNNs2 on widely used public benchmark data sets provided by the Conference on Machine Translation (WMT). When the CNN and the best RNN of similar size are trained in the same way, the CNN outperforms it by 1.5 BLEU on the WMT 2014 English-French task, a widely used metric for judging the accuracy of machine translation. On WMT 2014 English-German, the improvement is 0.5 BLEU, and on WMT 2016 English-Romanian, we improve by 1.8 BLEU.
One consideration with neural machine translation for practical applications is how long it takes to get a translation once we show the system a sentence. The FAIR CNN model is computationally very efficient and is nine times faster than strong RNN systems. Much research has focused on speeding up neural networks through quantizing weights or distillation, to name a few methods, and those can be equally applied to the CNN model to increase speed even more, suggesting significant future potential.
Better translation with multi-hop attention and gating
A distinguishing component of our architecture is multi-hop attention. An attention mechanism is similar to the way a person would break down a sentence when translating it: Instead of looking at the sentence only once and then writing down the full translation without looking back, the network takes repeated “glimpses” at the sentence to choose which words it will translate next, much like a human occasionally looks back at specific keywords when writing down a translation.3 Multi-hop attention is an enhanced version of this mechanism, which allows the network to make multiple such glimpses to produce better translations. These glimpses also depend on each other. For example, the first glimpse could focus on a verb and the second glimpse on the associated auxiliary verb.
In the figure below, we show when the system reads a French phrase (encoding) and then outputs an English translation (decoding). We first run the encoder to create a vector for each French word using a CNN, and the computation is done simultaneously. Next, the decoder CNN produces English words, one at a time. At every step, the attention glimpses the French sentence to decide which words are most relevant to predict the next English word in the translation. There are two so-called layers in the decoder, and the animation illustrates how the attention is done once for each layer. The strength of the green lines indicates how much the network focuses on each French word. When the network is being trained, the translation is always available, and the computation for the English words also can be done simultaneously.
Another aspect of our system is gating, which controls the information flow in the neural network. In every neural network, information flows through so-called hidden units. Our gating mechanism controls exactly which information should be passed on to the next unit so that a good translation can be produced. For example, when predicting the next word, the network takes into account the translation it has produced so far. Gating allows it to zoom in on a particular aspect of the translation or to get a broader picture — all depending on what the network deems appropriate in the current context.
This approach is an alternative architecture for machine translation that opens up new possibilities for other text processing tasks. For example, multi-hop attention in dialogue systems allows neural networks to focus on distinct parts of the conversation, such as two separate facts, and to tie them together in order to better respond to complex questions.
 Convolutional Sequence to Sequence Learning. Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann N. Dauphin. arXiv, 2017
 Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, Jeffrey Dean. Technical Report, 2016.
 Neural Machine Translation by Jointly Learning to Align and Translate. Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio. International Conference on Learning Representations, 2015.
Facebook’s notorious emotional manipulation study received more online attention than any other scientific research in 2014, according to an analytics company.
The paper, “Experimental evidence of massive-scale emotional contagion through social networks”, was published in the respected US journal the Proceedings of the National Academy of Sciences in July.
It sparked outrage by revealing that Facebook had been experimenting on hundreds of thousands of unwitting users, attempting to induce an “emotional state” by selectively showing positive or negative stories in their news feeds.
Academic analysts Altmetric suggests that the research will have done wonders for the scientists’ public engagement metrics after it ranked number one for attention out of every scientific article published in 2014.
Perhaps surprisingly, the majority of the recorded attention was on Twitter, where the article was shared 4,000 times to almost 10 million people. On Facebook itself there was little reaction to the research, which was shared publicly just 344 times. However, there were likely to have been more private wall posts on Facebook, so the total cannot be determined.
The article was also mentioned in 300 news sites, 130 blogposts, 13 subreddits and even 113 Google+ profiles.
But while Facebook may be a natural topic for online attention, the rest of the top five articles are more varied – and so are their reasons for getting so much attention. Second place went to a seemingly unassuming paper in the Journal of Ethology titled “Variation in Melanism and Female Preference in Proximate but Ecologically Distinct Environments”. But a quick scan of the articles citing the paper reveals the reason for its notability: the article was published with an author’s comment left in, asking “should we cite the crappy Gabor paper here?”.
The rest of the top five at least made the list for their contents. Third place went to a study from Nature suggesting that artificial sweeteners could induce glucose intolerance, while fourth place was a breakthrough in stem-cell research also published in Nature.
And the fifth place? Research published in Frontiers in Zoology in which animal behaviourists watched dogs defecating and discovered that they were sensitive to small variations in the Earth’s magnetic field.
Euan Adie, founder of Altmetric, said: “It’s no surprise to see that the most shared articles of the year heavily mirror the media agenda, but interesting to note that on occasion online communities are drawing attention to studies that have not received a significant amount of mainstream coverage.
“For example, we had more than 2,000 tweets for a study on how gaining basic certification affected nursing confidence levels. This reached a combined following of more than 2.2 million followers, demonstrating how social media can really boost the profile of some online published studies.”
•What happens to your Facebook account when you die?