DB Kernels that Require No Tuning

The following text was recently posted as an expert note in ODBMS.org. Many thanks to Roberto Zicari for the invitation!

Over the years database systems became extremely complex. This was necessary in order to accommodate the growing set of application requirements. As a result, database architectures today are not static. Each system comes with a vast array of knobs and there are numerous tuning decisions one has to take that are critical for performance.  

Some of the most critical decisions when setting up a db system include choosing the set of indexes or even the exact data layout (the way we store data defines how we can access it). 

Such decisions depend on the data, queries, hardware properties as well as application/user expectations and are typically performed manually.

If we know exactly the queries we are going to pose and the future data, if we have expertise so that we know how to tune our data system, and if we have plenty of time and compute resources to invest in performing all tuning and set-up actions, then we can have a data system which is perfectly tuned for the workload and will give close to optimal performance.

What if none of the above is true? What if some of the above is true but we cannot afford db administrators? What if our application evolves over time, requiring a different tuning set-up at different times? What if we cannot afford to wait for the system to be tuned and we just need to access our data as soon as it is generated? 

An auto-tuning database kernel requires zero or minimum tuning. 

The net effect is that with tuning out of the way such a system allows for quick access to data, and by reducing the cost of ownership it assists in democratizing db systems.

There is still a long way to go to fully materialize the vision of auto-tuning db kernels but several exciting advancements have taken place.  For example our work on adaptive indexing, adaptive loading and adaptive storage shows that it is possible to skip manual tuning completely and let the system make tuning decisions on the fly as queries and data arrive. Much of this work is captured under the umbrella of data exploration systems, it is optimized for main-memory processing in column-stores and is summarized here. Also, interested readers may find a plethora of references on relevant research work by the db community in our SIGMOD 2015 tutorial and a description of future steps in the Harvard DASlab white paper.