Publications by Type: Journal Article

2012

R. Barber, et al., “Business Analytics in (a) Blink,” IEEE Data Engineering Bulletin, vol. 35, no. 1, pp. 9-14, 2012.Abstract

The Blink project’s ambitious goal is to answer all Business Intelligence (BI) queries in mere seconds, regardless of the database size, with an extremely low total cost of ownership. Blink is a new DBMS aimed primarily at read-mostly BI query processing that exploits scale-out of commodity multi-core processors and cheap DRAM to retain a (copy of a) data mart completely in main memory. Additionally, it exploits proprietary compression technology and cache-conscious algorithms that reduce memory bandwidth consumption and allow most SQL query processing to be performed on the compressed data. Blink always scans (portions of) the data mart in parallel on all nodes, without using any indexes or materialized views, and without any query optimizer to choose among them. The Blink technology has thus far been incorporated into two IBM accelerator products generally available since March 2011. We are now working on the next generation of Blink, which will signiﬁcantly expand the “sweet spot” of the Blink technology to much larger, disk-based warehouses and allow Blink to “own” the data, rather than copies of it.

BlinkDebull2012.pdf

2011

P. Bonnet, et al., “Repeatability and workability evaluation of SIGMOD 2011,” SIGMOD Record, vol. 40, pp. 45-48, 2011.Abstract

SIGMOD has offered, since 2008, to verify the experiments published in the papers accepted at the conference. This year, we have been in charge of reproducing the experiments provided by the authors (repeatability), and exploring changes to experiment parameters (workability). In this paper, we assess the SIGMOD repeatability process in terms of participation, review process and results. While the participation is stable in terms of number of submissions, we find this year a sharp contrast between the high participation from Asian authors and the low participation from American authors. We also find that most experiments are distributed as Linux packages accompanied by instructions on how to setup and run the experiments. We are still far from the vision of executable papers.

SIGMOD2011Repeatability.pdf

M. L. Kersten, S. Idreos, S. Manegold, and E. Liarou, “The Researcher's Guide to the Data Deluge: Querying a Scientific Database in Just a Few Seconds,” Proceedings of the Very Large Databases Endowment (PVLDB), vol. 4, no. 12, pp. 1474-1477, 2011.Abstract

There is a clear need nowadays for extremely large data processing. This is especially true in the area of scientific data management where soon we expect data inputs in the order of multiple Petabytes. However, current data management technology is not suitable for such data sizes. In the light of such new database applications, we can rethink some of the strict requirements database systems adopted in the past. We argue that correctness is such a critical property, responsible for performance degradation. In this paper, we propose a new paradigm towards building database kernels that may produce wrong but fast, cheap and indicative results. Fast response times is an essential component of data analysis for exploratory applications; allowing for fast queries enables the user to develop a ``feeling" for the data through a series of ``painless" queries which eventually leads to more detailed analysis in a targeted data area. We propose a research path where a database kernel autonomously and on-the-fly decides to reduce the processing requirements of a running query based on workload, hardware and environmental parameters. It requires a complete redesign of database operators and query processing strategy. For example, typical and very common scenarios were query processing performance degrades significantly are cases where a database operator has to spill data to disk, or is forced to perform random access, or has to follow long linked lists, etc. Here we ask the question: What if we simply avoid these steps, ``ignoring" the side-effect in the correctness of the result?

TheResearchersGuidetoPVLDB11.pdf

S. Idreos, S. Manegold, H. A. Kuno, and G. Graefe, “Merging What's Cracked, Cracking What's Merged: Adaptive Indexing in Main-Memory Column-Stores,” Proceedings of the Very Large Databases Endowment (PVLDB), vol. 4, pp. 585-597, 2011.Abstract

Adaptive indexing is characterized by the partial creation and refinement of the index as side effects of query execution. Dynamic or shifting workloads may benefit from preliminary index structures focused on the columns and specific key ranges actually queried --- without incurring the cost of full index construction. The costs and benefits of adaptive indexing techniques should therefore be compared in terms of initialization costs, the overhead imposed upon queries, and the rate at which the index converges to a state that is fully-refined for a particular workload component. Based on an examination of database cracking and adaptive merging, which are two techniques for adaptive indexing, we seek a hybrid technique that has a low initialization cost and also converges rapidly. We find the strengths and weaknesses of database cracking and adaptive merging complementary. One has a relatively high initialization cost but converges rapidly. The other has a low initialization cost but converges relatively slowly. We analyze the sources of their respective strengths and explore the space of hybrid techniques. We have designed and implemented a family of hybrid algorithms in the context of a column-store database system. Our experiments compare their behavior against database cracking and adaptive merging, as well as against both traditional full index lookup and scan of unordered data. We show that the new hybrids significantly improve over past methods while at least two of the hybrids come very close to the ``ideal performance'' in terms of both overhead per query and convergence to a final state.

AdaptiveIndexingPVLDB11.pdf

2003

M. Koubarakis, C. Tryfonopoulos, S. Idreos, and Y. Drougas, “Selective information dissemination in P2P networks: problems and solutions,” ACM SIGMOD Record, Special issue on Peer-to-Peer Data Management, vol. 32, no. 3, pp. 71-76, 2003.Abstract

We study the problem of selective dissemination of information in P2P networks. We present our work on data models and languages for textual information dissemination and discuss a related P2P architecture that motivates our efforts. We also survey our results on the computational complexity of three related algorithmic problems (query satisfiability, entailment and filtering) and present efficient algorithms for the most crucial of these problems (filtering). Finally, we discuss the features of P2P-DIET, a super-peer system we have implemented at the Technical University of Crete, that realizes our vision and is able to support both ad-hoc querying and selective information dissemination scenarios in a P2P framework.

Stratos Idreos

Gordon McKay Professor of Computer Science

Publications by Type: Journal Article

Pages