Database Architectures for Big Data Exploration


Keynote in French Database Conference, October, 2013

Presentation Slides: 

We are now entering the era of data deluge, where the amount of data outgrows the capabilities of query processing technology. Many emerging applications, from social networks to scientific experiments, are representative examples of this deluge, where the rate at which data is produced exceeds any past experience. For example, scientific analysis such as astronomy is soon expected to collect multiple Terabytes of data on a daily basis, while already web-based businesses such as social networks or web log analysis are confronted with a growing stream of large data inputs. State-of-the-art database systems cannot cope with the data deluge requirements as they are designed for much more static environments. A modern database system requires heavy preparation steps and tuning which implies plenty of idle time to prepare the system and quite complete workload knowledge such as we know what to tune the system for. Today though, idle time and workload knowledge are scarce resources as application scenarios are much more dynamic and ad hoc. For example scientists might not always know what they are looking for and with Terabytes arriving daily they might also not have the time (and resources) to invest in any set-up actions; they just want to explore the data to see if there are any interesting patterns. In this talk, we focus on a new direction of query processing for big data where data exploration becomes a first class citizen. We will talk about database architectures which are tailored for scenarios where new big chunks of data arrive rapidly and we want to react quickly, i.e., with little time to spare for tuning and set-up. Such systems automatically and adaptively perform all core database actions such as data loading, indexing and tuning based on the running workload and with zero human input.