Repository logo
 
Loading...
Thumbnail Image
Publication

Automated anonymization of text documents

Use this identifier to reference this record.
Name:Description:Size:Format: 
9683 feito.pdf177.54 KBAdobe PDF Download

Authors

Dias, Francisco
Mamede, Nuno

Advisor(s)

Abstract(s)

Sharing data in the form of text is important for a wide range of activities but it also raises a concern about privacy when sharing data that could be sensitive. Automated text anonymization is a solution for removing all the sensitive information from documents. However, this is a challenging task due to the unstructured form of textual data and the ambiguity of natural language. In this work, we present our implementation of an automated anonymization system, built in a modular structure, for documents written in Portuguese. Four different methods of anonymization are evaluated and compared. Two methods replace the sensitive information by artificial labels: suppression and tagging. The other two methods replace the information by textual expressions: random substitution and generalization. Evaluation showed that the use of the tagging and the generalization methods facilitates the reading of an anonymized text while preventing some semantic drifts caused by the remotion of the original information.

Description

Keywords

Citation

Organizational Units

Journal Issue