An official website of the European Union How do you know?      
European Commission logo
JRC Publications Repository Menu

An ontology-based approach for developing a harmonised data-validation tool for European cancer registration

cover
Background: Population-based cancer registries constitute an important information source in cancer epidemiology. Studies collating and comparing data across regional and national boundaries have proved important for deploying and evaluating effective cancer-control strategies. A critical aspect in correctly comparing cancer indicators across regional and national boundaries lies in ensuring a good and harmonised level of data quality, which is a primary motivator for a centralised collection of pseudonymised data. The recent introduction of the European Union's general data-protection regulation imposes stricter conditions on the collection, processing, and sharing of pseudonymised data. The new regulation triggers the need to find solutions that allow a continuation of the smooth processes leading to harmonised European cancer-registry data. One element in this regard would be the availability of data-validation software tool based on a formalised depiction of the harmonised data-validation rules, allowing an eventual devolution of the data-validation process to the local level. Results: A semantic data model was derived from the data-validation rules for harmonising cancer-data variables at European level. The data model was encapsulated in an ontology developed using the Web-Ontology Language (OWL) with the data-model entities forming the main OWL classes. The data-validation rules were added as axioms in the ontology. The reasoning function of the resulting ontology demonstrated its ability to trap registry-coding errors and in some instances to be able to correct errors. Conclusion: Describing the European cancer-registry core data set in terms of an OWL ontology provides a tool based on a formalised set of axioms for unambiguously validating a cancer-registry’s data set according to harmonised, supra-national rules. The fact that the data checks are inherently linked to the data model will result in less maintenance overheads and automatic versioning synchronisation and control, important for distributed data-quality checking processes.
2021-01-13
BMC
JRC117512
2041-1480 (online),   
https://jbiomedsem.biomedcentral.com/articles/10.1186/s13326-020-00233-x,    https://publications.jrc.ec.europa.eu/repository/handle/JRC117512,   
10.1186/s13326-020-00233-x (online),   
Language Citation
NameCountryCityType
Datasets
IDTitlePublic URL
Dataset collections
IDAcronymTitlePublic URL
Scripts / source codes
DescriptionPublic URL
Additional supporting files
File nameDescriptionFile type 
Show metadata record  Copy citation url to clipboard  Download BibTeX
Items published in the JRC Publications Repository are protected by copyright, with all rights reserved, unless otherwise indicated. Additional information: https://ec.europa.eu/info/legal-notice_en#copyright-notice