Patents for Industrial Pollution Prevention and Control
Study commissioned by the
European Commission under
contract 940 597-2020 IT
An important objective of the European Green Deal is to achieve a circular economy with zero pollution. Due to their considerable environmental impact, large industrial installations naturally constitute one of the main areas of focus. As one of its activities, the Joint Research Centre (JRC) of the European Union regularly compiles reference documents on Best Available Techniques (BAT), called BREFs, giving a clear picture of the state-of-the-art in the field of industrial pollution prevention and control in all Member States. Mapping the environmental capabilities of EU countries to check if they match the long-term need for clean technology as implied by the European Green Deal is crucial for the EU’s envisaged green transition. One way to achieve this is to retrieve geo-localized documents describing R&D activities, in particular patents, across the EU and the world.
In this report, we set out to build an Information Retrieval (IR) system that is able to retrieve relevant patents from queries based on specific subsections of BREF documents. Past efforts on this front mainly rely on bag-of-words approaches such as TF-IDF or, with the goal of leveraging semantic information, on using pre-trained word embeddings like GloVe (Pennington et al., 2014). Following recent advances in the field of Natural Language Processing, we build an IR engine based on the Transformer architecture (Vaswani et al., 2017) supported by FAISS indexing (Johnson et al., 2017) and demonstrate its superiority compared to legacy approaches. We train and fine-tune our model using several open source datasets and assess its effectiveness by comparing its performance with that of baseline approaches on a brand new dataset provided by JRC.
CARIAGGI Francesco;
DE NOBILI Cristiano;
BRATIÈRES Sébastien;
2021-11-05
Publications Office of the European Union
JRC126541
978-92-76-41923-5 (online),
OP KJ-09-21-408-EN-N (online),
https://publications.jrc.ec.europa.eu/repository/handle/JRC126541,
10.2760/056875 (online),
Additional supporting files
| File name | Description | File type | |