Database Cracking: Towards Auto-tuning Database Kernels

Publication information:

S. Idreos,

“Database Cracking: Towards Auto-tuning Database Kernels”, 2010.

Abstract

Indices are heavily used in database systems in order to achieve the ultimatequery processing performance. It takes a lot of time to create an index and thesystem needs to reserve extra storage space to store the auxiliary data structure.When updates arrive, there is also the overhead of maintaining the index. Thisway, which indices to create and when to create them has been and still is oneof the most important research topics over the last decades.If the workload is known up-front or it can be predicted and if there isenough idle time to spare, then we can a priori create all necessary indices andexploit them when queries arrive. But what happens if we do not have thisknowledge or idle time? Similarly, what happens if the workload changes often,suddenly and in an unpredictable way? Even if we can correctly analyze thecurrent workload, it may well be that by the time we finish our analysis andcreate all necessary indices, the workload pattern has changed.Here we argue that a database system should just be given the data andqueries in a declarative way and the system should internally take care of findingnot only the proper algorithms and query plans but also the proper physicaldesign to match the workload and application needs. The goal is to removethe role of database administrators, leading to systems that can completelyautomatically self-tune and adapt even to dynamic environments. DatabaseCracking implements the first adaptive kernel that automatically adapts to theaccess patterns by selectively and adaptively optimizing the data set purely forthe workload at hand. It continuously reorganizes input data on-the-fly as aside-efect of query processing using queries as an advice of how data shouldbe stored. Everything happens within operator calls during query processingand brings knowledge to the system that future operators in future queries canexploit. Essentially, the necessary indices are built incrementally as the systemgains more and more knowledge about the workload needs.