In this article, we survey recent research on column-oriented database systems, or column-stores, where each attribute of a table is stored in a separate file or region on storage. Such databases have seen a resurgence in recent years with a rise in interest in analytic queries that perform scans and aggregates over large portions of a few columns of a table. The main advantage of a column-store is that it can access just the columns needed to answer such queries. We specifically focus on three influential research prototypes, MonetDB, VectorWise, and C-Store. These systems have formed the basis for several well-known commercial column-store implementations. We describe their similarities and differences and discuss their specific architectural features for compression, late materialization, join processing, vectorization and adaptive indexing (database cracking).
Physical design represents one of the hardest problems for database management systems. Without proper tuning, systems cannot achieve good performance. Offline indexing creates indexes a priori assuming good workload knowledge and idle time. More recently, online indexing monitors the workload trends and creates or drops indexes online. Adaptive indexing takes another step towards completely automating the tuning process of a database system, by enabling incremental and partial online indexing. The main idea is that physical design changes continuously, adaptively, partially, incrementally and on demand while processing queries as part of the execution operators. As such it brings a plethora of opportunities for rethinking and improving every single corner of database system design.
We will analyze the indexing space between offline, online and adaptive indexing through several state of the art indexing techniques, e. g., what-if analysis and soft indexes. We will discuss in detail adaptive indexing techniques such as database cracking, adaptive merging, sideways cracking and various hybrids that try to balance the online tuning overhead with the convergence speed to optimal performance. In addition, we will discuss how various aspects of modern techniques for database architectures, such as vectorization, bulk processing, column-store execution and storage affect adaptive indexing. Finally, we will discuss several open research topics towards fully autonomous database kernels.
MonetDB is a state-of-the-art open-source column-store database management system targeting applications in need for analytics over large collections of data. MonetDB is actively used nowadays in health care, in telecommunications as well as in scientiﬁc databases and in data management research, accumulating on average more than 10,000 downloads on a monthly basis. This paper gives a brief overview of the MonetDB technology as it developed over the past two decades and the main research highlights which drive the current MonetDB design and form the basis for its future evolution.
In DataCell, we design streaming functionalities in a modern relational database kernel which targets big data analytics. This includes exploitation of both its storage/execution engine and its optimizer infrastructure. We investigate the opportunities and challenges that arise with such a direction and we show that it carries significant advantages for modern applications in need for online analytics such as web logs, network monitoring and scientific data management. The major challenge then becomes the efficient support for specialized stream features, e.g., multi-query processing and incremental window-based processing as well as exploiting standard DBMS functionalities in a streaming environment such as indexing.
This demo presents DataCell, an extension of the MonetDB open-source column-store for online analytics. The demo gives users the opportunity to experience the features of DataCell such as processing both stream and persistent data and performing window based processing. The demo provides a visual interface to monitor the critical system components, e.g., how query plans transform from typical DBMS query plans to online query plans, how data flows through the query plans as the streams evolve, how DataCell maintains intermediate results in columnar form to avoid repeated evaluation of the same stream portions, etc. The demo also provides the ability to interactively set the test scenarios and various DataCell knobs.
As data collections become larger and larger, data loading evolves to a major bottleneck. Many applications already avoid using database systems, e.g., scientific data analysis and social networks, due to the complexity and the increased data-to-query time. For such applications data collections keep growing fast, even on a daily basis, and we are already in the era of data deluge where we have much more data than what we can move, store, let alone analyze.
Our contribution in this paper is the design and roadmap of a new paradigm in database systems, called NoDB, which do not require data loading while still maintaining the whole feature set of a modern database system. In particular, we show how to make raw data files a first-class citizen, fully integrated with the query engine. Through our design and lessons learned by implementing the NoDB philosophy over a modern DBMS, we discuss the fundamental limitations as well as the strong opportunities that such a research path brings. We identify performance bottlenecks specific for in situ processing, namely the repeated parsing and tokenizing overhead and the expensive data type conversion costs. To address these problems, we introduce an adaptive indexing mechanism that maintains positional information to provide efficient access to raw data files, together with a flexible caching structure.
Our implementation over PostgreSQL, called PostgresRaw, is able to avoid the loading cost completely, while matching the query performance of plain PostgreSQL and even outperforming it in many cases. We conclude that NoDB systems are feasible to design and implement over modern database architectures, bringing an unprecedented positive effect in usability and performance.
As data collections become larger and larger, users are faced with increasing bottlenecks in their data analysis. More data means more time to prepare the data, to load the data into the database and to execute the desired queries. Many applications already avoid using traditional database systems, e.g., scientific data analysis and social networks, due to their complexity and the increased data-to-query time, i.e. the time between getting the data and retrieving its first useful results. For many applications data collections keep growing fast, even on a daily basis, and this data deluge will only increase in the future, where it is expected to have much more data than what we can move or store, let alone analyze.
In this demonstration, we will showcase a new philosophy for designing database systems called NoDB. NoDB aims at minimizing the data-to-query time, most prominently by removing the need to load data before launching queries. We will present our prototype implementation, PostgresRaw, built on top of PostgreSQL, which allows for efficient query execution over raw data files with zero initialization overhead. We will visually demonstrate how PostgresRaw incrementally and adaptively touches, parses, caches and indexes raw data files autonomously and exclusively as a side-effect of user queries.
Modern business applications and scientific databases call for inherently dynamic data storage environments. Such environments are characterized by two challenging features: (a) they have little idle system time to devote on physical design; and (b) there is little, if any, a priori workload knowledge, while the query and data workload keeps changing dynamically. In such environments, traditional approaches to index building and maintenance cannot apply. Database cracking has been proposed as a solution that allows on-the-fly physical data reorganization, as a collateral effect of query processing. Cracking aims to continuously and automatically adapt indexes to the workload at hand, without human intervention. Indexes are built incrementally, adaptively, and on demand. Nevertheless, as we show, existing adaptive indexing methods fail to deliver workload-robustness; they perform much better with random workloads than with others. This frailty derives from the inelasticity with which these approaches interpret each query as a hint on how data should be stored. Current cracking schemes blindly reorganize the data within each query's range, even if that results into successive expensive operations with minimal indexing benefit.
In this paper, we introduce stochastic cracking, a significantly more resilient approach to adaptive indexing. Stochastic cracking also uses each query as a hint on how to reorganize data, but not blindly so; it gains resilience and avoids performance bottlenecks by deliberately applying certain arbitrary choices in its decision-making. Thereby, we bring adaptive indexing forward to a mature formulation that confers the workload-robustness previous approaches lacked. Our extensive experimental study verifies that stochastic cracking maintains the desired properties of original database cracking while at the same time it performs well with diverse realistic workloads.
Adaptive indexing initializes and optimizes indexes incrementally, as a side effect of query processing. The goal is to achieve the benefits of indexes while hiding or minimizing the costs of index creation. However, index-optimizing side effects seem to turn read-only queries into update transactions that might, for example, create lock contention. This paper studies concurrency control in the context of adaptive indexing. We show that the design and implementation of adaptive indexing rigorously separates index structures from index contents; this relaxes the constraints and requirements during adaptive indexing compared to those of traditional index updates. Our design adapts to the fact that an adaptive index is refined continuously, and exploits any concurrency opportunities in a dynamic way. A detailed experimental analysis demonstrates that (a) adaptive indexing maintains its adaptive properties even when running concurrent queries, (b) adaptive indexing can exploit the opportunity for parallelism due to concurrent queries, (c) the number of concurrency conflicts and any concurrency administration overheads follow an adaptive behavior, decreasing as the workload evolves and adapting to the workload needs.
The Blink project’s ambitious goal is to answer all Business Intelligence (BI) queries in mere seconds, regardless of the database size, with an extremely low total cost of ownership. Blink is a new DBMS aimed primarily at read-mostly BI query processing that exploits scale-out of commodity multi-core processors and cheap DRAM to retain a (copy of a) data mart completely in main memory. Additionally, it exploits proprietary compression technology and cache-conscious algorithms that reduce memory bandwidth consumption and allow most SQL query processing to be performed on the compressed data. Blink always scans (portions of) the data mart in parallel on all nodes, without using any indexes or materialized views, and without any query optimizer to choose among them. The Blink technology has thus far been incorporated into two IBM accelerator products generally available since March 2011. We are now working on the next generation of Blink, which will signiﬁcantly expand the “sweet spot” of the Blink technology to much larger, disk-based warehouses and allow Blink to “own” the data, rather than copies of it.
The trend for more online linked data becomes stronger. Foreseeing a future where “everything” will be online and linked, we ask the critical question; what is next? We envision that managing, querying and storing large amounts of links and data is far from yet another query processing task. We highlight two distinct and promising research directions towards managing and making sense of linked data. We introduce linked views to help focusing on specific link and data instances and linked history to help observe how links and data change over time.
Database management systems (DBMS) provide incredible flexibility and performance
when it comes to query processing, scalability and accuracy.
To fully exploit DBMS features, however, the user must define a schema,
load the data, tune the system for the expected workload, and answer several questions.
Should the database use a column-store, a row-store or some hybrid format?
What indices should be created?
All these questions make for a formidable and time-consuming hurdle, often
deterring new applications or imposing high cost to existing ones.
A characteristic example is that of scientific databases with huge data sets.
The prohibitive initialization cost and complexity
still forces scientists to rely on ``ancient" tools for their data management tasks,
delaying scientific understanding and progress.
Users and applications collect their data in flat files, which have traditionally been
considered to be ``outside" a DBMS. A DBMS wants control:
always bring all data ``inside", replicate it and format it in its own ``secret" way.
The problem has been recognized and current efforts extend existing systems with
abilities such as reading information from flat files
and gracefully incorporating it into the processing engine.
This paper proposes a new generation of systems where
the only requirement from the user is a link to the raw data files.
Queries can then immediately be fired without preparation steps in between.
Internally and in an abstract way, the system takes care of
selectively, adaptively and incrementally providing the proper environment
given the queries at hand. Only part of the data is loaded at any given time
and it is being stored and accessed in the format suitable for the current workload.
SIGMOD has offered, since 2008, to verify the experiments published in the papers accepted at the conference. This year, we have been in charge of reproducing the experiments provided by the authors (repeatability), and exploring changes to experiment parameters (workability). In this paper, we assess the SIGMOD repeatability process in terms of participation, review process and results. While the participation is stable in terms of number of submissions, we find this year a sharp contrast between the high participation from Asian authors and the low participation from American authors. We also find that most experiments are distributed as Linux packages accompanied by instructions on how to setup and run the experiments. We are still far from the vision of executable papers.
There is a clear need nowadays for extremely large data processing.
This is especially true in the area of scientific data management where soon we expect
data inputs in the order of multiple Petabytes.
However, current data management technology is not suitable for such data sizes.
In the light of such new database applications, we can rethink some of the strict
requirements database systems adopted in the past.
We argue that correctness is such a critical property, responsible for performance degradation.
In this paper, we propose a new paradigm towards building database kernels
that may produce wrong but fast, cheap and indicative results.
Fast response times is an essential component of data analysis for exploratory applications;
allowing for fast queries enables
the user to develop a ``feeling" for the data through a series of ``painless" queries which eventually leads
to more detailed analysis in a targeted data area.
We propose a research path where a database kernel autonomously and on-the-fly
decides to reduce the processing requirements of a running query
based on workload, hardware and
It requires a complete redesign of database operators
and query processing strategy.
For example, typical and very common scenarios were query processing performance degrades significantly
are cases where a database operator has to spill data
to disk, or is forced to perform random access, or has to follow long linked lists, etc.
Here we ask the question: What if we simply avoid these steps, ``ignoring" the side-effect
in the correctness of the result?
Adaptive indexing is characterized by the partial creation
and refinement of the index as side effects of query execution.
Dynamic or shifting workloads may benefit from preliminary
focused on the columns and specific key ranges actually queried ---
without incurring the cost of full index construction.
The costs and benefits of adaptive indexing techniques
should therefore be compared in terms of initialization costs,
the overhead imposed upon queries, and the rate at which
the index converges to a state that is fully-refined
for a particular workload component.
Based on an examination of database cracking and adaptive merging,
which are two techniques for adaptive indexing,
we seek a hybrid technique that has a low initialization cost
and also converges rapidly.
We find the strengths and weaknesses of database cracking and
adaptive merging complementary.
One has a relatively high initialization cost but converges rapidly.
The other has a low initialization cost but converges relatively slowly.
We analyze the sources of their respective strengths and
explore the space of hybrid techniques.
We have designed and implemented a family of hybrid algorithms
in the context of a column-store database system.
Our experiments compare their behavior against
database cracking and adaptive merging, as well as against
both traditional full index lookup and scan of unordered data.
We show that the new hybrids significantly improve over
past methods while at least two of the hybrids
come very close to the ``ideal performance''
in terms of both overhead per query and convergence to a final state.
The Blink project’s ambitious goals are to answer all Business Intelligence (BI) queries in mere seconds, regardless of the database size, with an extremely low total cost of ownership. It takes a very innovative and counter-intuitive approach to processing BI queries, one that exploits several disruptive hardware and software technology trends. Specifically, it is a new, workload-optimized DBMS aimed primarily at BI query processing, and exploits scale-out of commodity multi-core processors and cheap DRAM to retain a (copy of a) data mart completely in main memory. Additionally, it exploits proprietary compression technology and cache-conscious algorithms that reduce memory bandwidth consumption and allow most SQL query processing to be performed on the compressed data. Ignoring the general wisdom of the last three decades that the only way to scalably search large databases is with indexes, Blink always performs simple, “brute force” scans of the entire data mart in parallel on all nodes, without using any indexes or materialized views, and without any query optimizer to choose among them. The Blink technology has thus far been incorporated into two products: (1) an accelerator appliance product for DB2 for z/OS (on the “mainframe”), called the IBM Smart Analytics Optimizer for DB2 for z/OS, V1.1, which was generally available in November 2010; and (2) the Informix Warehouse Accelerator (IWA), a software-only version that was generally available in March 2011. We are now working on the next generation of Blink, called BLink Ultra, or BLU, which will significantly expand the “sweet spot” of Blink technology to much larger, disk-based warehouses and allow BLU to “own” the data, rather than copies of it.
Indices are heavily used in database systems in order to achieve the ultimate
query processing performance. It takes a lot of time to create an index and the
system needs to reserve extra storage space to store the auxiliary data structure.
When updates arrive, there is also the overhead of maintaining the index. This
way, which indices to create and when to create them has been and still is one
of the most important research topics over the last decades.
If the workload is known up-front or it can be predicted and if there is
enough idle time to spare, then we can a priori create all necessary indices and
exploit them when queries arrive. But what happens if we do not have this
knowledge or idle time? Similarly, what happens if the workload changes often,
suddenly and in an unpredictable way? Even if we can correctly analyze the
current workload, it may well be that by the time we finish our analysis and
create all necessary indices, the workload pattern has changed.
Here we argue that a database system should just be given the data and
queries in a declarative way and the system should internally take care of finding
not only the proper algorithms and query plans but also the proper physical
design to match the workload and application needs. The goal is to remove
the role of database administrators, leading to systems that can completely
automatically self-tune and adapt even to dynamic environments. Database
Cracking implements the first adaptive kernel that automatically adapts to the
access patterns by selectively and adaptively optimizing the data set purely for
the workload at hand. It continuously reorganizes input data on-the-fly as a
side-efect of query processing using queries as an advice of how data should
be stored. Everything happens within operator calls during query processing
and brings knowledge to the system that future operators in future queries can
exploit. Essentially, the necessary indices are built incrementally as the system
gains more and more knowledge about the workload needs.
G. Graefe, S. Idreos, H. A. Kuno, and S. Manegold, “Benchmarking Adaptive Indexing,” in Proceedings of the 2nd TPC Technology Conference on Performance Evaluation & Benchmarking (TPCTC), Singapore, 2010, pp. 169-184.Abstract
Ideally, realizing the best physical design for the current and all subsequent workloads would impact neither performance nor storage usage. In reality, workloads and datasets can change dramatically over time and index creation impacts the performance of concurrent user and system activity. We propose a framework that evaluates the key premise of adaptive indexing -- a new indexing paradigm where index creation and re-organization take place automatically and incrementally, as a side-effect of query execution. We focus on how the incremental costs and benefits of dynamic reorganization are distributed across the workload's lifetime. We believe measuring the costs and utility of the stages of adaptation are relevant metrics for evaluating new query processing paradigms and comparing them to traditional approaches.
Data compression techniques such as null suppression
and dictionary compression are commonly used in today’s
database systems. In order to effectively leverage compression, it
is necessary to have the ability to efficiently and accurately
estimate the size of an index if it were to be compressed. Such an
analysis is critical if automated physical design tools are to be
extended to handle compression. Several database systems today
provide estimators for this problem based on random sampling.
While this approach is efficient, there is no previous work that
analyses its accuracy. In this paper, we analyse the problem of
estimating the compressed size of an index from the point of view
of worst-case guarantees. We show that the simple estimator
implemented by several database systems has several “good”
cases even though the estimator itself is agnostic to the internals
of the specific compression algorithm.
Stream applications gained significant popularity over the last years that lead to the development of specialized stream engines. These systems are designed from scratch with a different philosophy than nowadays database engines in order to cope with the stream applications requirements. However, this means that they lack the power and sophisticated techniques of a full fledged database system that exploits techniques and algorithms accumulated over many years of database research.
In this paper, we take the opposite route and design a stream engine directly on top of a database kernel. Incoming tuples are directly stored upon arrival in a new kind of system tables, called baskets. A continuous query can then be evaluated over its relevant baskets as a typical one-time query exploiting the power of the relational engine. Once a tuple has been seen by all relevant queries/operators, it is dropped from its basket. A basket can be the input to a single or multiple similar query plans. Furthermore, a query plan can be split into multiple parts each one with its own input/output baskets allowing for flexible load sharing query scheduling. Contrary to traditional stream engines, that process one tuple at a time, this model allows batch processing of tuples, e.g., query a basket only after x tuples arrive or after a time threshold has passed. Furthermore, we are not restricted to process tuples in the order they arrive. Instead, we can selectively pick tuples from a basket based on the query requirements exploiting a novel query component, the basket expressions.
We investigate the opportunities and challenges that arise with such a direction and we show that it carries significant advantages. We propose a complete architecture, the DataCell, which we implemented on top of an open-source column-oriented DBMS. A detailed analysis and experimental evaluation of the core algorithms using both micro benchmarks and the standard Linear Road benchmark demonstrate the potential of this new approach.