PhD Thesis at Universiteit van Amsterdam,
2015 - Pirk, H.
Abstract: Computer systems are not the monolithic machines they used to be. In the early days of computer science (until the late 70s), most computer systems included exactly one component to perform a given task: one (type of) disc for persistence, one CPU for processing and one volatile RAM to hold intermediate data. Today, the architecture has developed into a heterogeneous landscape of components: discs, SSDs, RAM, NVRAM, GPUs and CPUs with a hierarchy of caches In this thesis, we study the management of relational data in modern, i.e., asymmetric computer systems. We explore different strategies to identify asymmetries in persistent data, map them to asymmetries in the memory landscape and, eventually, exploit them to increase query processing performance. To this end, we study memory conscious decomposition and storage of data at different granularities: relations, vertical partitions, single attributes as well as individual bits. In the interest of conciseness, we exclude techniques that require auxilliary data structures such as indices or horizontal partitioning which come with significant maintenance overhead.
Further, we argue that, when managing memory-resident data, the problem of optimal data placement is tightly connected to the efficiency of the query processing paradigm and can, therefore, not be studied in isolation. Consequently, we also investigate the connection between storage model and processing paradigm. In the case of decomposition at partition granularity we identify Just-in-Time compilation as the only viable query processing model. In the case of distribution at the granularity of individual bits, we develop a novel processing paradigm that efficiently exploits the asymmetries in the underlying data and memory components.