SeTABERTa is a new multilingual langue model pertained from scratch using various Open Access text repositories: EU legislation, research articles, EU public documents and US patents. 2/3 of training data is English. The other part of data covers EU24 languages. The model was trained on JRC Big Data Platform. The model can be fine-tuned for other tasks.
DAUDARAVICIUS Vidas;
FRIIS-CHRISTENSEN Anders;
2024-03-15
European Commission
JRC137020
https://publications.jrc.ec.europa.eu/repository/handle/JRC137020,
| Name | Country | City | Type |
|---|
This document is only visible at the Commission level.
You are not authorized to publish or distribute it outside the European Commission.
This is a public document. You can share this publication.
Datasets
| ID | Title | Public URL |
|---|
Dataset collections
| ID | Acronym | Title | Public URL |
|---|
Scripts / source codes
| Description | Public URL |
|---|
Additional supporting files
| File name | Description | File type |
|---|