Learned Encodings in SageDB
Author(s)
Cen, Lujing
DownloadThesis PDF (785.8Kb)
Advisor
Kraska, Tim
Terms of use
Metadata
Show full item recordAbstract
As the demand for data outpaces diminishing improvements in the hardware used to store and query them, we must find intelligent ways to increase database performance on existing systems. This project is focused on integrating learned encodings into SageDB, a database capable of accelerating queries by analyzing and adapting to different workloads. Encodings improve query performance through lossless compression, thereby reducing I/O time during scans. Different encoding types exhibit different characteristics depending on properties of the underlying data and the hardware on which queries are executed. We implement a variety of common encodings in SageDB and propose a learning-based approach to select the optimal encoding for a given data block by combining block-level statistics with sampling. In addition, we demonstrate how to leverage properties of encoded data along with vectorized processing units in modern CPUs to more efficiently execute queries without the need to decode every value.
Date issued
2021-06Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology