Publications by Author: Demi Guo

2019
S. Idreos, et al., “Learning Data Structure Alchemy,” Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol. 42, no. 2, pp. 46-57, 2019.Abstract

We propose a solution based on first principles and AI to the decades-old problem of data structure design. Instead of working on individual designs that each can only be helpful in a small set of environments, we propose the construction of an engine, a Data Alchemist, which learns how to blend fine-grained data structure design principles to automatically synthesize brand new data structures.

learningdatastructurealchemy.pdf
2018
S. Idreos, et al., “The Periodic Table of Data Structures,” Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol. 41, no. 3, pp. 64-75, 2018.Abstract
We describe the vision of being able to reason about the design space of data structures. 
We break this down into two questions: 1) Can we know all data structures that is possible to design?  2) Can we compute the performance of arbitrary designs on a given hardware and workload without having to implement the design or even access the target hardware?
If those challenges are possible, then an array of exciting opportunities would become feasible such as interactive what-if design to improve the productivity of data systems researchers and engineers, and informed decision making in industrial settings with regards to critical ardware/workload/data structure design issues. Then, even fully automated discovery of new data structure designs becomes possible. Furthermore, the structure of the design space itself provides numerous insights and opportunities such as the existence of design continuums that can lead to data systems with deep adaptivity, and a new understanding of the possible performance trade-offs. Given the universal presence of data structures at the very core of any data-driven field across all sciences and industries, reasoning about their design can have significant benefits, making it more feasible (easier, faster and cheaper) to adopt tailored state-of-the-art storage solutions. And this effect is going to become increasingly more critical as data keeps growing, hardware keeps changing and more applications/fields realize the transformative power and potential of data analytics.  
This paper presents this vision and surveys first steps that demonstrate its feasibility. 
periodictabledatastructures.pdf
S. Idreos, K. Zoumpatianos, B. Hentschel, M. S. Kester, and D. Guo, “The Data Calculator: Data Structure Design and Cost Synthesis From First Principles, and Learned Cost Models,” in ACM SIGMOD International Conference on Management of Data , 2018.Abstract

Data structures are critical in any data-driven scenario, but they are notoriously hard to design due to a massive design space and the dependence of performance on workload and hardware which evolve continuously. We present a design engine, the Data Calculator, which enables interactive and semi-automated design of data structures. It brings two innovations. First, it offers a set of fine-grained design primitives that capture the first principles of data layout design: how data structure nodes lay data out, and how they are positioned relative to each other. This allows for a structured description of the universe of possible data structure designs that can be synthesized as combinations of those primitives. The second innovation is computation of performance using learned cost models. These models are trained on diverse hardware and data profiles and capture the cost properties of fundamental data access primitives (e.g., random access). With these models, we synthesize the performance cost of complex operations on arbitrary data structure designs without having to: 1) implement the data structure, 2) run the workload, or even 3) access the target hardware. We demonstrate that the Data Calculator can assist data structure designers and researchers by accurately answering rich what-if design questions on the order of a few seconds or minutes, i.e., computing how the performance (response time) of a given data structure design is impacted by variations in the: 1) design, 2) hardware, 3) data, and 4) query workloads. This makes it effortless to test numerous designs and ideas before embarking on lengthy implementation, deployment, and hardware acquisition steps. We also demonstrate that the Data Calculator can synthesize entirely new designs, auto-complete partial designs, and detect suboptimal design choices.

datacalculator.pdf
2016
S. Idreos, et al., “Past and Future Steps for Adaptive Storage Data Systems: From Shallow to Deep Adaptivity,” in International Workshop on Enabling Real-Time Business Intelligence (BIRTE 2016), 2016.Abstract

 

Datasystems with adaptive storage can autonomously change their behavior by altering how data is stored and accessed. Such systems have been studied primarily for the case of adaptive indexing to auto- matically create the right indexes at the right granularity. More recently work on adaptive loading and adaptive data layouts brought even more flexibility. We survey this work and describe the need for even deeper adaptivity that goes beyond adjusting knobs in a single architecture; instead it can adapt the fundamental architecture of a data system to drastically alter its behavior. 

 

birte2016.pdf