Wilk, Marcin
[UCL]
Verriest, Anthony
[UCL]
Legay, Axel
[UCL]
The study of automatic processing and comprehension of business documents is known as Document AI. It covers activities such as reading and evaluating a document’s content. Working with business papers presents several difficulties, including dealing with their many formats and intricate layouts. This paper aims to review the state-of-the-art Deep Learning models that are used in the field of Document AI and the techniques employed to improve their performances. This work represents a concrete case of information extraction from expenses, done as part of a master’s thesis in collaboration with Odoo. The employed methodology falls under the data-centric approach, which is a new arising method that advocates focusing on the data to a greater extent instead working heavily on the model. The Deep Learning models that will be presented all extend the pioneering BERT model, a predominant Natural Language Processing (NLP) model from recent years with the breakthrough of Transformers. Such models are firstly pre-trained with multiple types of data before being fine-tuned on a custom dataset, in this case, images of expenses. The creation of the specialized training dataset will be the core of the data-centric approach. This data-centric workflow is used to evaluate the performance of different models and identify the one that performs the best. Moreover, the model’s predictions are checked for faults or inaccuracies to update and improve the image processing techniques. This will include an examination of the post-processing methods used to parse the predictions made by the models. Additionally, a discussion of the overall performance and the limitations of the data-centric approach will be initiated. In the end, the different business values that the best model leverages will be presented.
Bibliographic reference |
Wilk, Marcin ; Verriest, Anthony. A data-centric approach to information extraction of expenses. Ecole polytechnique de Louvain, Université catholique de Louvain, 2023. Prom. : Legay, Axel. |
Permanent URL |
http://hdl.handle.net/2078.1/thesis:38720 |