Extracting Compact Knowledge From Massive Data

Zhang, Dejiao

Extracting Compact Knowledge From Massive Data

Zhang, Dejiao

2019

View/Open

dejiao_1.pdf

(4.8MB

PDF)

Abstract

Over the past couple decades, we have witnessed a huge explosion in data generation from almost every perspective on our lives. Along with such huge volumes of data come more complex models, e.g., deep neural networks (DNNs). This increase in complexity demands new trends in both modeling and analysis of data, among which low dimensionality and sparsity lie at the core. In this thesis, we follow this avenue to address some problems and challenges raised by modern data and models. High-dimensional data are often not uniformly distributed in the feature space, but instead they lie in the vicinity of a low dimensional subspace. Identifying such low-dimensional structures cannot only give better interpretability of the data, but also significantly reduce the storage and computation costs for algorithms that deal with the data. The second chapter of this thesis focuses on low-rank linear subspace models, and we particularly focus on improving and analyzing an efficient subspace estimation method in the context of streaming data with emphasis on data being undersampled. On the other hand, real word data are in general non-linear and involve much more complex dependencies, which motivates the development of DNNs. With massive amounts of data and computation power, the high capacity and the hierarchical structure of DNNs allow them to learn extremely complex non-linear dependencies and features. However, the successes achieved by DNNs are marred by the inscrutability of models, poor generalizability, and high demands on data and computational resources, especially given that the size and the complexity of DNNs keeps increasing. To combat these challenges, we specifically focus on two perspectives, model compression and disentangled representation learning. DNNs are often over-parameterized with many parameters being redundant and non-critical, hence successfully removing these connections is expected to improve both efficiency and generalization. In Chapter III, we go a step further by presenting a new method for compressing DNNs, which encourages sparsity while simultaneously identifying strongly correlated neurons and setting the corresponding weights to a common value. The ability of our method to identify correlations within the network not only helps further reduce the complexity of DNNs, but also allows us to cope with and gain more insights on the highly correlated neurons instead of being negatively affected by them. From another perspective, many believe that the poor generalization and interpretability of DNNs can be resolved if the model can, in the setting of unsupervised learning, identify and separate out the underlying explanatory factors of data into different factors of its learned representation. Such representations are more likely to be used across a variety of tasks, with each particular task being relevant with a different subset or combination of all representation factors. In Chapter IV, we present an information theoretic approach for jointly learning a hybrid discrete-continuous representation, where the goal is to uncover the underlying categories of data while simultaneously separating the continuous representation into statistical independent components with each encoding a specific variation in data.

Subjects

Subspace identification from streaming data

Neural network compression by simultaneous sparsification and parameter tying

Unsupervised learning of interpretable representations

Types

Thesis

Handle

https://hdl.handle.net/2027.42/151479

Metadata

Show full item record

Collections

Dissertations and Theses (Ph.D. and Master's)

Remediation of Harmful Language

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form. More information at Remediation of Harmful Language.

Accessibility

If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.