An official website of the European Union How do you know?      
European Commission logo
JRC Publications Repository Menu

Arbuli sunnu: a Sicilian-Italian Parallel Treebank

cover
The Natural Language Processing (NLP) community has recently begun to engage with endangered languages and dialects which encode culturally different perspectives and local knowledge. Regardless of the usefulness and applicability of NLP tools for such languages, creating resources for dialects increases our knowledge of them, encourages the community to study them further, and supports the preservation of an important heritage. As part of this endeavour, we are focussing on Sicilian, a dialect spoken in Sicily, with a rich cultural history. Sicilian preservation is crucial to maintaining Southern Italy’s linguistic diversity. In this paper, we present the first release of a novel treebank called Sicilian3bank. On the one hand, to improve the usability of this resource and provide access to non-Sicilian speakers, all sentences are linked to their translation into Italian, resulting in a 1:1 parallel resource. On the other hand, by applying the Universal Dependencies format, a widely used standard for the annotation of treebanks, we pave the way for data-driven cross-linguistic research. We hope that this work can serve as a basis for further linguistic research and computational applications for the Sicilian dialect
2025-12-09
CEUR-WS.ORG
JRC142844
1613-0073 (online),   
https://ceur-ws.org/Vol-4112/16_main_long.pdf,    https://publications.jrc.ec.europa.eu/repository/handle/JRC142844,   
NameCountryCityType
Datasets
IDTitlePublic URL
Dataset collections
IDAcronymTitlePublic URL
Scripts / source codes
DescriptionPublic URL
Additional supporting files
File nameDescriptionFile type 
Show metadata record  Copy citation url to clipboard  Download BibTeX
Items published in the JRC Publications Repository are protected by copyright, with all rights reserved, unless otherwise indicated. Additional information: https://ec.europa.eu/info/legal-notice_en#copyright-notice