An official website of the European Union How do you know?      
European Commission logo
JRC Publications Repository Menu

DCEP - Digital Corpus of the European Parliament

cover
The paper presents a new highly multilingual sentence-aligned parallel corpus consisting of various document types and covering a wide range of subject domains. With a total of 1.37 billion words in 23 languages (253 language pairs), gathered in the course of ten years, this is the largest single release of documents by a European Union institution. Corpus statistics, required preprocessing, sentence alignment, and possible gains in statistical machine translation when adding this corpus to the previously existing ones are also considered.
2014-09-24
European Language Resources Association (ELRA)
JRC87087
978-2-9517408-8-4,   
http://www.lrec-conf.org/proceedings/lrec2014/pdf/943_Paper.pdf,    https://publications.jrc.ec.europa.eu/repository/handle/JRC87087,   
Language Citation
NameCountryCityType
Datasets
IDTitlePublic URL
Dataset collections
IDAcronymTitlePublic URL
Scripts / source codes
DescriptionPublic URL
Additional supporting files
File nameDescriptionFile type 
Show metadata record  Copy citation url to clipboard  Download BibTeX
Items published in the JRC Publications Repository are protected by copyright, with all rights reserved, unless otherwise indicated. Additional information: https://ec.europa.eu/info/legal-notice_en#copyright-notice