Rare words in Neural machine translation

In the work “Continuous Learning in Neural Machine Translation using Bilingual Dictionaries” we analysed the ability of NMT systems to translate rare terms and presented techniques to improve their ability to translate morphological variants.The propsed methods is based on creating a new test set using a different split of the training and test data concentrating on these terms.

Ready-to-use-dataset

In the paper we splited the following data sets. The splits from the paper can be downloaded here

TED Enlish-German
Europarl English-German
Europarl English-Czech

Own data sets

You can split your own data sets with the same methods using the code in the github repository

Publication

All details of the methods you can find in the publication: Niehues, J. (2021). Continuous Learning in Neural Machine Translation using Bilingual Dictionaries. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021). Kiew, Ukraine.

Natural langauge processing @ DKE, Maasticht University

Ready-to-use-dataset

Own data sets

Publication