MyJournals Home  

RSS FeedsEntropy, Vol. 21, Pages 1213: Improving Neural Machine Translation by Filtering Synthetic Parallel Data (Entropy)

 
 

11 december 2019 10:01:03

 
Entropy, Vol. 21, Pages 1213: Improving Neural Machine Translation by Filtering Synthetic Parallel Data (Entropy)
 


Synthetic data has been shown to be effective in training state-of-the-art neural machine translation (NMT) systems. Because the synthetic data is often generated by back-translating monolingual data from the target language into the source language, it potentially contains a lot of noise—weakly paired sentences or translation errors. In this paper, we propose a novel approach to filter this noise from synthetic data. For each sentence pair of the synthetic data, we compute a semantic similarity score using bilingual word embeddings. By selecting sentence pairs according to these scores, we obtain better synthetic parallel data. Experimental results on the IWSLT 2017 Korean→English translation task show that despite using much less data, our method outperforms the baseline NMT system with back-translation by up to 0.72 and 0.62 Bleu points for tst2016 and tst2017, respectively.


 
241 viewsCategory: Informatics, Physics
 
Entropy, Vol. 21, Pages 1207: Towards Generation of Cat States in Trapped Ions Set-Ups via FAQUAD Protocols and Dynamical Decoupling (Entropy)
Entropy, Vol. 21, Pages 1214: A Two Phase Method for Solving the Distribution Problem in a Fuzzy Setting (Entropy)
 
 
blog comments powered by Disqus


MyJournals.org
The latest issues of all your favorite science journals on one page

Username:
Password:

Register | Retrieve

Search:

Physics


Copyright © 2008 - 2024 Indigonet Services B.V.. Contact: Tim Hulsen. Read here our privacy notice.
Other websites of Indigonet Services B.V.: Nieuws Vacatures News Tweets Nachrichten