Database Cracking and the Path Towards Auto-tuning Database Kernels


Faculty Job Talk, Spring 2013

Presentation Slides: 

Today, businesses and sciences create more data than what we can store, move, let alone analyze. A fundamental problem with big data is that data management systems require extensive tuning (indexing) and installation steps; by the time we finish tuning, more data have already arrived. To make things worse, tuning a database system requires knowledge of what to tune for, i.e., we need to know the kind of queries we will be posing. However, in several modern big data applications, we are often in need of exploring new data quickly, searching for interesting patterns without knowing a priori exactly what we are looking for. 

Database cracking removes completely the need for index-tuning in database systems. With database cracking indices are built incrementally, adaptively and on demand; each query is seen as an advice on how data should be stored. With each incoming query, data is reorganized on-the-fly as part of query processing, while future queries exploit and continuously enhance this knowledge. Autonomously, adaptively and without any external human administration, the database system quickly adapts to a new workload and reaches optimal performance when the workload stabilizes.