Keyword Extraction for Privacy Policy Analysis Using Topic Modelling Approaches

Loading...
Thumbnail Image

Date

Authors

Chen, Sijie

Journal Title

Journal ISSN

Volume Title

Publisher

University of Guelph

Abstract

Privacy policies are official documents that inform users about how their data are collected and used by the service providers. However, such documents are often verbose and full of legal jargons, making it difficult for ordinary users to read and understand them. Our research objective is to develop effective solutions to the extraction of keywords that can help the coverage and relevancy analysis of privacy policies with regards to the related data practices. To this end, we extended two topic models: LDA (Latent Dirichlet Allocation) and POSLDA (Part-of-Speech LDA) with prior information about different data practices and Part-of-Speech classes and compare their performance for the keyword extraction of privacy policies. We used the OPP-115 dataset for the optimization of the topic models and the evaluation of keyword extraction. Our results show that both LDA and POSLDA are capable of extracting quality keywords from privacy policies on various topics, and POSLDA can not only distinguish the POS classes of keywords for different topics, but also improve the accuracy of keyword extraction by removing the stop words customized from the same modelling process.

Description

Keywords

privacy policy, keyword extraction, topic modelling

Citation