Generating Synthetic Healthcare Records Using Convolutional Generative Adversarial Networks

Deep learning models have demonstrated high-quality performance in several areas such as image classification and speech processing. However, creating a deep learning model using electronic health record (EHR) data requires addressing particular privacy challenges that make this issue unique to researchers in this domain. This matter focuses attention on generating realistic synthetic data to amplify privacy. Existing methods for artificial data generation suffer from different limitations such as being bound to particular use cases. Furthermore, their generalizability to real-world problems is controversial regarding the uncertainties in defining and measuring key realistic characteristics. Henceforth, there is a need to establish insightful metrics (and to measure the validity of synthetic data), as well as quantitative criteria regarding privacy restrictions. We propose the use of Generative Adversarial Networks to help satisfy requirements for realistic characteristics and acceptable values of privacy metrics simultaneously. The present study makes several unique contributions to synthetic data generation in the healthcare domain. First, utilizing 1-D Convolutional Neural Networks (CNNs), we devise a new approach to capturing the correlation between adjacent diagnosis records. Second, we employ convolutional autoencoders to map the discrete-continuous values. Finally, we devise a new approach to measure the similarity between real and synthetic data, and a means to measure the fidelity of the synthetic data and its associated privacy risks.

Keywords

synthetic data, generative adversarial networks, healthcare, convolutional neural networks

Persistent link

http://hdl.handle.net/10919/96186

Collections

CS6604: Digital Libraries

Full item page