Enhanced Stream Processing in a DBMS Kernel

Publication information:

E. Liarou, S. Idreos, S. Manegold, and M. Kersten,
“Enhanced Stream Processing in a DBMS Kernel”, in Proceedings of the 16th International Conference on Extending Database Technology (EDBT), Genoa, Italy, 2013, pp. 501–512.

Abstract

Continuous query processing has emerged asa promising query processing paradigm with numerous applications.A recent development is the need to handle both streaming queries and typical one-time queriesin the same application. For example, data warehousing can greatly benefit from theintegration of stream semantics, i.e., online analysis of incoming data and combination with existing data.This is especially useful to provide low latency in data-intensive analysisin big data warehouses that are augmented with new data on a daily basis.However, state-of-the-art database technology cannot handle streams efficiently due to their``continuous" nature.At the same time, state-of-the-art stream technology is purely focused on stream applications.The research efforts are mostly geared towards the creation of specialized stream management systemsbuilt with a different philosophy than a DBMS.The drawback of this approach is the limited opportunities to exploitsuccessful past data processing technology, e.g., query optimization techniques.For this new problem we need to combine the best of both worlds.Here we take a completely different route by designing astream engine on top of an existing relational database kernel.This includes reuse of both its storage/execution engine and its optimizerinfrastructure. The major challenge then becomes the efficient supportfor specialized stream features.This paper focuses on incremental window-based processing, arguably the most crucialstream-specific requirement.In order to maintain and reuse the generic storage and execution model of the DBMS,we elevate the problem at the query plan level.Proper optimizer rules, scheduling and intermediate result caching and reuse,allow us to modify the DBMS query plans for efficient incremental processing.We describe in detail the new approach and we demonstrateefficient performance even against specialized stream engines, especially when scalabilitybecomes a crucial factor