Jamotton, Charlotte
[UCL]
Hainaut, Donatien
[UCL]
This article explores the application of Latent Dirichlet Allocation (LDA) to structured tabular insurance data. LDA is a probabilistic topic modelling approach initially developed in Natural Language Processing (NLP) to uncover the underlying structure of (unstructured) textual data. It was designed to represent textual documents as mixture of latent (hidden) topics, and topics as mixtures of words. This study introduces the LDA’s document-topic distribution as a soft clustering tool for unsupervised learningtasks in the actuarial field. By defining each topic as a risk profile, and by treating insurance policies as documents and the modalities of categorical covariates as words, we show how LDA can be extended beyond textual data and can offer a framework to uncover underlying structures within insurance portfolios. Our experimental results and analysis highlight how the modelling of policies based on topic cluster membership, and the identification of dominant modalities within each risk profile, can give insights into the prominent risk factors contributing to higher or lower claim frequencies.
![](https://dial.uclouvain.be/pr/boreal/sites/all/modules/dial/dial_user/dial_user_list/images/shopping-basket-gray--plus.png)
![](https://dial.uclouvain.be/pr/boreal/sites/all/modules/dial/dial_widget/dial_widget_pr/images/icons/printer.png)
Bibliographic reference |
Jamotton, Charlotte ; Hainaut, Donatien. Latent Dirichlet Allocation for structured insurance data. LIDAM Discussion Paper ISBA ; 2024/08 (2024) 27 pages |
Permanent URL |
http://hdl.handle.net/2078.1/285770 |