The Blink project’s ambitious goal is to answer all Business Intelligence (BI) queries in mere seconds, regardless of the database size, with an extremely low total cost of ownership. Blink is a new DBMS aimed primarily at read-mostly BI query processing that exploits scale-out of commodity multi-core processors and cheap DRAM to retain a (copy of a) data mart completely in main memory. Additionally, it exploits proprietary compression technology and cache-conscious algorithms that reduce memory bandwidth consumption and allow most SQL query processing to be performed on the compressed data. Blink always scans (portions of) the data mart in parallel on all nodes, without using any indexes or materialized views, and without any query optimizer to choose among them. The Blink technology has thus far been incorporated into two IBM accelerator products generally available since March 2011. We are now working on the next generation of Blink, which will signiﬁcantly expand the “sweet spot” of the Blink technology to much larger, disk-based warehouses and allow Blink to “own” the data, rather than copies of it.
The trend for more online linked data becomes stronger. Foreseeing a future where “everything” will be online and linked, we ask the critical question; what is next? We envision that managing, querying and storing large amounts of links and data is far from yet another query processing task. We highlight two distinct and promising research directions towards managing and making sense of linked data. We introduce linked views to help focusing on specific link and data instances and linked history to help observe how links and data change over time.
Database management systems (DBMS) provide incredible flexibility and performance
when it comes to query processing, scalability and accuracy.
To fully exploit DBMS features, however, the user must define a schema,
load the data, tune the system for the expected workload, and answer several questions.
Should the database use a column-store, a row-store or some hybrid format?
What indices should be created?
All these questions make for a formidable and time-consuming hurdle, often
deterring new applications or imposing high cost to existing ones.
A characteristic example is that of scientific databases with huge data sets.
The prohibitive initialization cost and complexity
still forces scientists to rely on ``ancient" tools for their data management tasks,
delaying scientific understanding and progress.
Users and applications collect their data in flat files, which have traditionally been
considered to be ``outside" a DBMS. A DBMS wants control:
always bring all data ``inside", replicate it and format it in its own ``secret" way.
The problem has been recognized and current efforts extend existing systems with
abilities such as reading information from flat files
and gracefully incorporating it into the processing engine.
This paper proposes a new generation of systems where
the only requirement from the user is a link to the raw data files.
Queries can then immediately be fired without preparation steps in between.
Internally and in an abstract way, the system takes care of
selectively, adaptively and incrementally providing the proper environment
given the queries at hand. Only part of the data is loaded at any given time
and it is being stored and accessed in the format suitable for the current workload.
SIGMOD has offered, since 2008, to verify the experiments published in the papers accepted at the conference. This year, we have been in charge of reproducing the experiments provided by the authors (repeatability), and exploring changes to experiment parameters (workability). In this paper, we assess the SIGMOD repeatability process in terms of participation, review process and results. While the participation is stable in terms of number of submissions, we find this year a sharp contrast between the high participation from Asian authors and the low participation from American authors. We also find that most experiments are distributed as Linux packages accompanied by instructions on how to setup and run the experiments. We are still far from the vision of executable papers.
There is a clear need nowadays for extremely large data processing.
This is especially true in the area of scientific data management where soon we expect
data inputs in the order of multiple Petabytes.
However, current data management technology is not suitable for such data sizes.
In the light of such new database applications, we can rethink some of the strict
requirements database systems adopted in the past.
We argue that correctness is such a critical property, responsible for performance degradation.
In this paper, we propose a new paradigm towards building database kernels
that may produce wrong but fast, cheap and indicative results.
Fast response times is an essential component of data analysis for exploratory applications;
allowing for fast queries enables
the user to develop a ``feeling" for the data through a series of ``painless" queries which eventually leads
to more detailed analysis in a targeted data area.
We propose a research path where a database kernel autonomously and on-the-fly
decides to reduce the processing requirements of a running query
based on workload, hardware and
It requires a complete redesign of database operators
and query processing strategy.
For example, typical and very common scenarios were query processing performance degrades significantly
are cases where a database operator has to spill data
to disk, or is forced to perform random access, or has to follow long linked lists, etc.
Here we ask the question: What if we simply avoid these steps, ``ignoring" the side-effect
in the correctness of the result?
Adaptive indexing is characterized by the partial creation
and refinement of the index as side effects of query execution.
Dynamic or shifting workloads may benefit from preliminary
focused on the columns and specific key ranges actually queried ---
without incurring the cost of full index construction.
The costs and benefits of adaptive indexing techniques
should therefore be compared in terms of initialization costs,
the overhead imposed upon queries, and the rate at which
the index converges to a state that is fully-refined
for a particular workload component.
Based on an examination of database cracking and adaptive merging,
which are two techniques for adaptive indexing,
we seek a hybrid technique that has a low initialization cost
and also converges rapidly.
We find the strengths and weaknesses of database cracking and
adaptive merging complementary.
One has a relatively high initialization cost but converges rapidly.
The other has a low initialization cost but converges relatively slowly.
We analyze the sources of their respective strengths and
explore the space of hybrid techniques.
We have designed and implemented a family of hybrid algorithms
in the context of a column-store database system.
Our experiments compare their behavior against
database cracking and adaptive merging, as well as against
both traditional full index lookup and scan of unordered data.
We show that the new hybrids significantly improve over
past methods while at least two of the hybrids
come very close to the ``ideal performance''
in terms of both overhead per query and convergence to a final state.
The Blink project’s ambitious goals are to answer all Business Intelligence (BI) queries in mere seconds, regardless of the database size, with an extremely low total cost of ownership. It takes a very innovative and counter-intuitive approach to processing BI queries, one that exploits several disruptive hardware and software technology trends. Specifically, it is a new, workload-optimized DBMS aimed primarily at BI query processing, and exploits scale-out of commodity multi-core processors and cheap DRAM to retain a (copy of a) data mart completely in main memory. Additionally, it exploits proprietary compression technology and cache-conscious algorithms that reduce memory bandwidth consumption and allow most SQL query processing to be performed on the compressed data. Ignoring the general wisdom of the last three decades that the only way to scalably search large databases is with indexes, Blink always performs simple, “brute force” scans of the entire data mart in parallel on all nodes, without using any indexes or materialized views, and without any query optimizer to choose among them. The Blink technology has thus far been incorporated into two products: (1) an accelerator appliance product for DB2 for z/OS (on the “mainframe”), called the IBM Smart Analytics Optimizer for DB2 for z/OS, V1.1, which was generally available in November 2010; and (2) the Informix Warehouse Accelerator (IWA), a software-only version that was generally available in March 2011. We are now working on the next generation of Blink, called BLink Ultra, or BLU, which will significantly expand the “sweet spot” of Blink technology to much larger, disk-based warehouses and allow BLU to “own” the data, rather than copies of it.
Indices are heavily used in database systems in order to achieve the ultimate
query processing performance. It takes a lot of time to create an index and the
system needs to reserve extra storage space to store the auxiliary data structure.
When updates arrive, there is also the overhead of maintaining the index. This
way, which indices to create and when to create them has been and still is one
of the most important research topics over the last decades.
If the workload is known up-front or it can be predicted and if there is
enough idle time to spare, then we can a priori create all necessary indices and
exploit them when queries arrive. But what happens if we do not have this
knowledge or idle time? Similarly, what happens if the workload changes often,
suddenly and in an unpredictable way? Even if we can correctly analyze the
current workload, it may well be that by the time we finish our analysis and
create all necessary indices, the workload pattern has changed.
Here we argue that a database system should just be given the data and
queries in a declarative way and the system should internally take care of finding
not only the proper algorithms and query plans but also the proper physical
design to match the workload and application needs. The goal is to remove
the role of database administrators, leading to systems that can completely
automatically self-tune and adapt even to dynamic environments. Database
Cracking implements the first adaptive kernel that automatically adapts to the
access patterns by selectively and adaptively optimizing the data set purely for
the workload at hand. It continuously reorganizes input data on-the-fly as a
side-efect of query processing using queries as an advice of how data should
be stored. Everything happens within operator calls during query processing
and brings knowledge to the system that future operators in future queries can
exploit. Essentially, the necessary indices are built incrementally as the system
gains more and more knowledge about the workload needs.
G. Graefe, S. Idreos, H. A. Kuno, and S. Manegold, “Benchmarking Adaptive Indexing,” in Proceedings of the 2nd TPC Technology Conference on Performance Evaluation & Benchmarking (TPCTC), Singapore, 2010, pp. 169-184.Abstract
Ideally, realizing the best physical design for the current and all subsequent workloads would impact neither performance nor storage usage. In reality, workloads and datasets can change dramatically over time and index creation impacts the performance of concurrent user and system activity. We propose a framework that evaluates the key premise of adaptive indexing -- a new indexing paradigm where index creation and re-organization take place automatically and incrementally, as a side-effect of query execution. We focus on how the incremental costs and benefits of dynamic reorganization are distributed across the workload's lifetime. We believe measuring the costs and utility of the stages of adaptation are relevant metrics for evaluating new query processing paradigms and comparing them to traditional approaches.
Data compression techniques such as null suppression
and dictionary compression are commonly used in today’s
database systems. In order to effectively leverage compression, it
is necessary to have the ability to efficiently and accurately
estimate the size of an index if it were to be compressed. Such an
analysis is critical if automated physical design tools are to be
extended to handle compression. Several database systems today
provide estimators for this problem based on random sampling.
While this approach is efficient, there is no previous work that
analyses its accuracy. In this paper, we analyse the problem of
estimating the compressed size of an index from the point of view
of worst-case guarantees. We show that the simple estimator
implemented by several database systems has several “good”
cases even though the estimator itself is agnostic to the internals
of the specific compression algorithm.
Stream applications gained significant popularity over the last years that lead to the development of specialized stream engines. These systems are designed from scratch with a different philosophy than nowadays database engines in order to cope with the stream applications requirements. However, this means that they lack the power and sophisticated techniques of a full fledged database system that exploits techniques and algorithms accumulated over many years of database research.
In this paper, we take the opposite route and design a stream engine directly on top of a database kernel. Incoming tuples are directly stored upon arrival in a new kind of system tables, called baskets. A continuous query can then be evaluated over its relevant baskets as a typical one-time query exploiting the power of the relational engine. Once a tuple has been seen by all relevant queries/operators, it is dropped from its basket. A basket can be the input to a single or multiple similar query plans. Furthermore, a query plan can be split into multiple parts each one with its own input/output baskets allowing for flexible load sharing query scheduling. Contrary to traditional stream engines, that process one tuple at a time, this model allows batch processing of tuples, e.g., query a basket only after x tuples arrive or after a time threshold has passed. Furthermore, we are not restricted to process tuples in the order they arrive. Instead, we can selectively pick tuples from a basket based on the query requirements exploiting a novel query component, the basket expressions.
We investigate the opportunities and challenges that arise with such a direction and we show that it carries significant advantages. We propose a complete architecture, the DataCell, which we implemented on top of an open-source column-oriented DBMS. A detailed analysis and experimental evaluation of the core algorithms using both micro benchmarks and the standard Linear Road benchmark demonstrate the potential of this new approach.
Column-stores gained popularity as a promising physical design alternative. Each attribute of a relation is physically stored as a separate column allowing queries to load only the required attributes. The overhead incurred is on-the-fly tuple reconstruction for multi-attribute queries. Each tuple reconstruction is a join of two columns based on tuple IDs, making it a significant cost component. The ultimate physical design is to have multiple presorted copies of each base table such that tuples are already appropriately organized in multiple different orders across the various columns. This requires the ability to predict the workload, idle time to prepare, and infrequent updates.
In this paper, we propose a novel design, partial sideways cracking, that minimizes the tuple reconstruction cost in a self-organizing way. It achieves performance similar to using presorted data, but without requiring the heavy initial presorting step itself. Instead, it handles dynamic, unpredictable workloads with no idle time and frequent updates. Auxiliary dynamic data structures, called cracker maps, provide a direct mapping between pairs of attributes used together in queries for tuple reconstruction. A map is continuously physically reorganized as an integral part of query evaluation, providing faster and reduced data access for future queries. To enable flexible and self-organizing behavior in storage-limited environments, maps are materialized only partially as demanded by the workload. Each map is a collection of separate chunks that are individually reorganized, dropped or recreated as needed. We implemented partial sideways cracking in an open-source column-store. A detailed experimental analysis demonstrates that it brings significant performance benefits for multi-attribute queries.
This paper studies the problem of evaluating continuous multi-way joins on top of Distributed Hash Tables (DHTs). We present a novel algorithm, called recursive join (RJoin), that takes into account various parameters crucial in a distributed setting i.e., network traffic, query processing load distribution, storage load distribution etc. The key idea of RJoin is incremental evaluation: as relevant tuples arrive continuously, a given multi-way join is rewritten continuously into a join with fewer join operators, and is assigned continuously to different nodes of the network. In this way, RJoin distributes the responsibility of evaluating a continuous multi-way join to many network nodes by assigning parts of the evaluation of each binary join to a different node depending on the values of the join attributes. The actual nodes to be involved are decided by RJoin dynamically after taking into account the rate of incoming tuples with values equal to the values of the joined attributes. RJoin also supports sliding window joins which is a crucial feature, especially for long join paths, since it provides a mechanism to reduce the query processing state and thus keep the cost of handling incoming tuples stable. In addition, RJoin is able to handle message delays due to heavy network traffic. We present a detailed mathematical and experimental analysis of RJoin and study the performance tradeoffs that occur.
We study the continuous evaluation of conjunctive triple pattern queries over RDF data stored in distributed hash tables. In a continuous query scenario network nodes subscribe with long-standing queries and receive answers whenever RDF triples satisfying their queries are published. We present two novel query processing algorithms for this scenario and analyze their properties formally. Our performance goal is to have algorithms that scale to large amounts of RDF data, distribute the storage and query processing load evenly and incur as little network traffic as possible. We discuss the various performance tradeoffs that occur through a detailed experimental evaluation of the proposed algorithms.
S. Idreos, M. L. Kersten, and S. Manegold, “Updating a cracked database,” in Proceedings of the 27th ACM SIGMOD International Conference on Management of Data, Beijing, China, 2007, pp. 413-424.Abstract
A cracked database is a datastore continuously reorganized based on operations being executed. For each query, the data of interest is physically reclustered to speed-up future access to the same, overlapping or even disjoint data. This way, a cracking DBMS self-organizes and adapts itself to the workload.
So far, cracking has been considered for static databases only. In this paper, we introduce several novel algorithms for high-volume insertions, deletions and updates against a cracked database. We show that the nice performance properties of a cracked database can be maintained in a dynamic environment where updates interleave with queries. Our algorithms comply with the cracking philosophy, i.e., a table is informed on pending insertions and deletions, but only when the relevant data is needed for query processing just enough pending update actions are applied.
We discuss details of our implementation in the context of an open-source DBMS and we show through a detailed experimental evaluation that our algorithms always manage to keep the cost of querying a cracked datastore with pending updates lower than the non-cracked case.
S. Idreos, M. L. Kersten, and S. Manegold, “Database Cracking,” in Proceedings of the 3rd International Conference on Innovative Data Systems Research (CIDR), Asilomar, California, 2007, pp. 68-78.Abstract
Database indices provide a non-discriminative navigational infrastructure to localize tuples of interest. Their maintenance cost is taken during database updates. In this paper, we study the complementary approach, addressing index maintenance as part of query processing using continuous physical reorganization, i.e., cracking the database into manageable pieces. The motivation is that by automatically organizing data the way users request it, we can achieve fast access and the much desired self-organized behavior.
We present the first mature cracking architecture and report on our implementation of cracking in the context of a full fledged relational system. It led to a minor enhancement to its relational algebra kernel, such that cracking could be piggy-backed without incurring too much processing overhead. Furthermore, we illustrate the ripple effect of dynamic reorganization on the query plans derived by the SQL optimizer. The experiences and results obtained are indicative of a significant reduction in system complexity. We show that the resulting system is able to self-organize based on incoming requests with clear performance benefits. This behavior is visible even when the user focus is randomly shifting to different parts of the data.
We study the problem of resource discovery in the Semantic Grid. We show how to solve this problem by utilizing Atlas, a P2P system for the distributed storage and retrieval of RDF(S) data. Atlas is currently under development in project OntoGrid funded by FP6. Atlas is built on top of the distributed hash table Bamboo and supports pull and push querying scenarios. It inherits all the nice features of Bamboo (openness, scalability, fault-tolerance, resistance to high churn rates) and extends Bamboo's protocols for storing and querying RDF(S) data. Atlas is being used currently to realize the metadata service of S-OGSA in a fully distributed and scalable way. In this paper, we concentrate on the main features of Atlas and demonstrate its use for Semantic Grid resource discovery in an OntoGrid use case scenario.
Publish/subscribe systems are an alternative to query-based systems in cases where the same information is asked for over and over, and where clients want to get updated answers for the same query over a period of time. Recent publish/subscribe systems such as P2P-DIET have introduced this paradigm in the P2P context. In this chapter we built on the experience gained with P2P-DIET and the Edutella super-peer infrastructure and present a semantic publish/subscribe system supporting metadata and a query language based on RDF. We define formally the basic concepts of our system and present detailed protocols for its operation.
We study the problem of continuous relational query processing in Internet-scale overlay networks realized by distributed hash tables. We concentrate on the case of continuous two-way equi-join queries. Joins are hard to evaluate in a distributed continuous query environment because data from more than one relations is needed, and this data is inserted in the network asynchronously. Each time a new tuple is inserted, the network nodes have to cooperate to check if this tuple can contribute to the satisfaction of a query when combined with previously inserted tuples. We propose a series of algorithms that initially index queries at network nodes using hashing. Then, they exploit the values of join attributes in incoming tuples to rewrite the given queries into simpler ones, and reindex them in the network where they might be satisfied by existing or future tuples. We present a detailed experimental evaluation in a simulated environment and we show that our algorithms are scalable, balance the storage and query processing load and keep the network traffic low.
We study the problem of evaluating conjunctive queries com- posed of triple patterns over RDF data stored in distributed hash tables. Our goal is to develop algorithms that scale to large amounts of RDF data, distribute the query processing load evenly and incur little network traﬃc. We present and evaluate two novel query processing algorithms with these possibly conﬂicting goals in mind. We discuss the various tradeoﬀs that occur in our setting through a detailed experimental evaluation of the proposed algorithms.