An official website of the European Union How do you know?      
European Commission logo
JRC Publications Repository Menu

Semantic Text Analyser BERT-like language model for formal language understanding

cover
SeTABERTa is a new multilingual langue model pertained from scratch using various Open Access text repositories: EU legislation, research articles, EU public documents and US patents. 2/3 of training data is English. The other part of data covers EU24 languages. The model was trained on JRC Big Data Platform. The model can be fine-tuned for other tasks.
2024-03-15
European Commission
JRC137020
Language Citation
NameCountryCityType
Datasets
IDTitlePublic URL
Dataset collections
IDAcronymTitlePublic URL
Scripts / source codes
DescriptionPublic URL
Additional supporting files
File nameDescriptionFile type 
Show metadata record  Copy citation url to clipboard  Download BibTeX
Items published in the JRC Publications Repository are protected by copyright, with all rights reserved, unless otherwise indicated. Additional information: https://ec.europa.eu/info/legal-notice_en#copyright-notice