Data privacy : the non-interactive setting

Access full-text files

Date

2009-05

Authors

Narayanan, Arvind, 1981-

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The Internet has enabled the collection, aggregation and analysis of personal data on a massive scale. It has also enabled the sharing of collected data in various ways: wholesale outsourcing of data warehousing, partnering with advertisers for targeted advertising, data publishing for exploratory research, etc. This has led to complex privacy questions related to the leakage of sensitive user data and mass harvesting of information by unscrupulous parties. These questions have information-theoretic, sociological and legal aspects and are often poorly understood. There are two fundamental paradigms for how the data is released: in the interactive setting, the data collector holds the data while third parties interact with the data collector to compute some function on the database. In the non-interactive setting, the database is somehow \sanitized" and then published. In this thesis, we conduct a thorough theoretical and empirical investigation of privacy issues involved in non-interactive data release. Both settings have been well analyzed in the academic literature, but simplicity of the non-interactive paradigm has resulted in its being used almost exclusively in actual data releases. We analyze several common applications including electronic directories, collaborative ltering and recommender systems, and social networks. Our investigation has two main foci. First, we present frameworks for privacy and anonymity in these dierent settings within which one might dene exactly when a privacy breach has occurred. Second, we use these frameworks to experimentally analyze actual large datasets and quantify privacy issues. The picture that has emerged from this research is a bleak one for noninteractivity. While a surprising level of privacy control is possible in a limited number of applications, the general sense is that protecting privacy in the non-interactive setting is not as easy as intuitively assumed in the absence of rigorous privacy denitions. While some applications can be salvaged either by moving to an interactive setting or by other means, in others a rethinking of the tradeos between utility and privacy that are currently taken for granted appears to be necessary.

Description

text

Keywords

Citation