Title: An overview of the European Union’s highly multilingual parallel corpora
Citation: LANGUAGE RESOURCES AND EVALUATION vol. 48 no. 4 p. 679-707
Publisher: SPRINGER
Publication Year: 2014
JRC N°: JRC80562
ISSN: 1574-020X
URI: http://link.springer.com/article/10.1007/s10579-014-9277-0
DOI: 10.1007/s10579-014-9277-0
Type: Articles in periodicals and books
Abstract: Starting in 2006, the European Commission’s Joint Research Centre (JRC) and other European Union organisations have made available a number of large-scale highly-multilingual parallel language resources. In this article, we give a comparative overview of these resources and we explain the specific nature of each of them. This article provides answers to a number of question, including: What are these linguistic resources? What is the difference between them? Why were they originally created and why was the data released publicly? What can they be used for and what are the limitations of their usability? What are the text types, subject domains and languages covered? How to avoid overlapping document sets? How do they compare regarding the formatting and the translation alignment? What are their usage conditions? What other types of multilingual linguistic resources does the EU have? This article thus aims to clarify what the similarities and differences between the various resources are and what they can be used for. It will also serve as a reference publication for those resources, for which a more detailed description has been lacking so far (EAC-TM, ECDC-TM and DGT-Acquis).
JRC Directorate:Space, Security and Migration

Files in This Item:
There are no files associated with this item.

Items in repository are protected by copyright, with all rights reserved, unless otherwise indicated.