Publications by Type: Journal Article

2021

K. Deeds, B. Hentschel, and S. Idreos, “Stacked Filters: Learning to Filter by Structure,” Proceedings of the VLDB Endowment, vol. 14, no. 4, pp. 600 - 612, 2021.Abstract

We present Stacked Filters, a new probabilistic filter which is fast and robust similar to query-agnostic filters (such as Bloom and Cuckoo filters), and at the same time brings low false positive rates and sizes similar to classifier-based filters (such as Learned Filters). The core idea is that Stacked Filters incorporate workload knowledge about frequently queried non-existing values. Instead of learning, they structurally incorporate that knowledge using hashing and several sequenced filter layers, indexing both data and frequent negatives. Stacked Filters can also gather workload knowledge on-the-fly and adaptively build the filter. We show experimentally that for a given memory budget, Stacked Filters achieve end-to-end query throughput up to 130x better than the best alternative for a workload, either query-agnostic or classifier-based filters, and depending on where data is (SSD or HDD).

stackedfilters_vldb2021_extended_version.pdf

2019

M. Athanassoulis, K. S. Bøgh, and S. Idreos, “Optimal Column Layout for Hybrid Workloads,” Proceedings of the Very Large Databases Endowment, vol. 12, no. 13, 2019.Abstract

Data-intensive analytical applications need to support both efficient reads and writes. However, what is usually a good data layout for an update-heavy workload, is not well-suited for a read-mostly one and vice versa. Modern analytical data systems rely on columnar layouts and employ delta stores to inject new data and updates.

We show that for hybrid workloads we can achieve close to one order of magnitude better performance by tailoring the column layout design to the data and query workload. Our approach navigates the possible design space of the physical layout: it organizes each column’s data by determining the number of partitions, their corresponding sizes and ranges, and the amount of buffer space and how it is allocated. We frame these design decisions as an optimization problem that, given workload knowledge and performance requirements, provides an optimal physical layout for the workload at hand. To evaluate this work, we build an in-memory storage engine, Casper, and we show that it outperforms state-of-the-art data layouts of analytical systems for hybrid workloads. Casper deliv- ers up to 2.32× higher throughput for update-intensive workloads and up to 2.14× higher throughput for hybrid workloads. We further show how to make data layout decisions robust to workload variation by carefully selecting the input of the optimization.

caspervldb2020.pdf

S. Idreos, et al., “Learning Data Structure Alchemy,” Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol. 42, no. 2, pp. 46-57, 2019.Abstract

We propose a solution based on first principles and AI to the decades-old problem of data structure design. Instead of working on individual designs that each can only be helpful in a small set of environments, we propose the construction of an engine, a Data Alchemist, which learns how to blend fine-grained data structure design principles to automatically synthesize brand new data structures.

learningdatastructurealchemy.pdf

2018

N. Dayan, M. Athanassoulis, and S. Idreos, “Optimal Bloom Filters and Adaptive Merging for LSM-Trees,” ACM Transactions on Database Systems, 2018.Abstract

In this paper, we show that key-value stores backed by a log-structured merge-tree (LSM-tree) exhibit an intrinsic trade-off between lookup cost, update cost, and main memory footprint, yet all existing designs expose a suboptimal and difficult to tune trade-off among these metrics. We pinpoint the problem to the fact that modern key-value stores suboptimally co-tune the merge policy, the buffer size, and the Bloom filters’ false positive rates across the LSM-tree’s different levels.

We present Monkey, an LSM-tree based key-value store that strikes the optimal balance between the costs of updates and lookups with any given main memory budget. The core insight is that worst-case lookup cost is proportional to the sum of the false positive rates of the Bloom filters across all levels of the LSM-tree. Contrary to state-of-the-art key-value stores that assign a fixed number of bits-per-element to all Bloom filters, Monkey allocates memory to filters across different levels so as to minimize the sum of their false positive rates. We show analytically that Monkey reduces the asymptotic complexity of the worst-case lookup I/O cost, and we verify empirically using an implementation on top of RocksDB that Monkey reduces lookup latency by an increasing margin as the data volume grows (50% − 80% for the data sizes we experimented with). Furthermore, we map the design space onto a closed-form model that enables adapting the merging frequency and memory allocation to strike the best trade-off among lookup cost, update cost and main memory, depending on the workload (proportion of lookups and updates), the dataset (number and size of entries), and the underlying hardware (main memory available, disk vs. flash). We show how to use this model to answer what-if design questions about how changes in environmental parameters impact performance and how to adapt the design of the key-value store for optimal performance.

monkeytods.pdf

S. Idreos, et al., “The Periodic Table of Data Structures,” Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol. 41, no. 3, pp. 64-75, 2018.Abstract

We describe the vision of being able to reason about the design space of data structures.
We break this down into two questions: 1) Can we know all data structures that is possible to design? 2) Can we compute the performance of arbitrary designs on a given hardware and workload without having to implement the design or even access the target hardware?
If those challenges are possible, then an array of exciting opportunities would become feasible such as interactive what-if design to improve the productivity of data systems researchers and engineers, and informed decision making in industrial settings with regards to critical ardware/workload/data structure design issues. Then, even fully automated discovery of new data structure designs becomes possible. Furthermore, the structure of the design space itself provides numerous insights and opportunities such as the existence of design continuums that can lead to data systems with deep adaptivity, and a new understanding of the possible performance trade-offs. Given the universal presence of data structures at the very core of any data-driven field across all sciences and industries, reasoning about their design can have significant benefits, making it more feasible (easier, faster and cheaper) to adopt tailored state-of-the-art storage solutions. And this effect is going to become increasingly more critical as data keeps growing, hardware keeps changing and more applications/fields realize the transformative power and potential of data analytics.
This paper presents this vision and surveys first steps that demonstrate its feasibility.

periodictabledatastructures.pdf

R. Borovica, S. Idreos, A. Ailamaki, M. Zukowski, and C. Fraser, “Smooth Scan: Robust Access Path Selection without Cardinality Estimation,” The International Journal on Very Large Databases (VLDBJ), 2018.Abstract

Query optimizers depend heavily on statistics representing column distributions to create good query plans. In many cases, though, statistics are outdated or non-existent, and the process of refreshing statistics is very expensive, especially for ad-hoc work- loads on ever bigger data. This results in suboptimal plans that severely hurt performance. The core of the problem is the fixed decision on the type of physical operators that comprise a query plan.

This paper makes a case for continuous adaptation and morphing of physical operators throughout their lifetime, by adjusting their behavior in accordance with the observed statistical properties of the data at run- time. We demonstrate the benefits of the new paradigm by designing and implementing an adaptive access path operator called Smooth Scan, which morphs continuously within the space of index access and full table scan. Smooth Scan behaves similarly to an index scan for low selectivity; if selectivity increases, however, Smooth Scan progressively morphs its behavior toward a sequential scan. As a result, a system with Smooth Scan requires no optimization decisions on the access paths up front. Additionally, by depending only on the result distribution and eschewing statistics and cardinality estimates altogether, Smooth Scan ensures repeatable execution across multiple query invocations. Smooth Scan implemented in PostgreSQL demonstrates robust, near-optimal performance on micro-benchmarks and real-life workloads, while being statistics-oblivious at the same time.

smoothscan.pdf

2016

K. Zoumpatianos, S. Idreos, and T. Palpanas, “ADS: The Adaptive Data Series Index,” The Very Large Databases Journal (VLDBJ), vol. 25, no. 6, pp. 843-866, 2016.Abstract

Numerous applications continuously produce big amounts of data series, and in several time critical scenarios analysts need to be able to query these data as soon as they become available. This, however, is not currently possible with the state-of-the-art indexing methods and for very large data series collections. In this paper, we present the first adaptive indexing mechanism, specifically tailored to solve the problem of indexing and querying very large data series collections. We present a detailed design and evaluation of our method using approximate and exact query algorithms with both synthetic and real datasets. Adaptive indexing significantly outperforms previous solutions, gracefully handling large data series collections, reducing the data to query delay: by the time state-of-the-art indexing techniques finish indexing 1 billion data series (and before answering even a single query), our method has already answered 3 ∗ 105 queries.

vldbj16-ads.pdf

2015

K. Zoumpatianos, S. Idreos, and T. Palpanas, “RINSE: Interactive Data Series Exploration with ADS+,” Proceedings of the Very Large Databases Endowment (PVLDB), vol. 8, no. 12, 2015.Abstract

Numerous applications continuously produce big amounts of data series, and in several time critical scenarios analysts need to be able to query these data as soon as they become available. An adaptive index data structure, ADS+, which is specifically tailored to solve the problem of indexing and querying very large data series collections has been recently proposed as a solution to this problem. The main idea is that instead of building the complete index over the complete data set up-front and querying only later, we interactively and adaptively build parts of the index, only for the parts of the data on which the users pose queries. The net effect is that instead of waiting for extended periods of time for the index creation, users can immediately start exploring the data series. In this work, we present a demonstration of ADS+; we introduce RINSE, a system that allows users to experience the benefits of the ADS+ adaptive index through an intuitive web interface. Users can explore large datasets and find patterns of interest, using nearest neighbor search. They can draw queries (data series) using a mouse, or touch screen, or they can select from a predefined list of data series. RINSE can scale to large data sizes, while drastically reducing the data to query delay: by the time state-of-the-art indexing techniques finish indexing 1 billion data series (and before answering even a single query), adaptive data series indexing can already answer 300K queries.

rinsevldb15.pdf

S. Idreos, “DASlab: The Data Systems Laboratory at Harvard SEAS,” ACM SIGMOD Record, 2015.Abstract

DASlab is a new laboratory at the Harvard School of Engineering and Applied Sciences (SEAS). The lab was formed in January 2014 when Stratos Idreos joined Harvard SEAS. DASlab currently consists of 3 PhD students, 1 postdoctoral researcher and 9 undergraduate researchers while it is set to double its graduate student population in the next one to two years. The lab is part of a growing community of systems and computer science researchers at Harvard; computer science faculty is scheduled to grow by 50\% in the next few years.

The main focus of DASlab is on designing data systems that (a) make it easy to extract knowledge out of increasingly diverse and growing data sets and (b) can stand the test of time.

daslab.pdf

I. Alagiannis, R. Borovica, M. Branco, S. Idreos, and A. Ailamaki, “NoDB: Efficient Query Execution on Raw Data Files,” Communications of the ACM, Research Highlights, 2015.Abstract

As data collections become larger and larger, users are faced with increasing bottlenecks in their data analysis. More data means more time to prepare and to load the data into the database before executing the desired queries. Many applications already avoid using database systems, e.g., scientific data analysis and social networks, due to the complexity and the increased data-to-query time, i.e., the time between getting the data and retrieving its first useful results. For many applications data collections keep growing fast, even on a daily basis, and this data deluge will only increase in the future, where it is expected to have much more data than what we can move or store, let alone analyze.

We here present the design and roadmap of a new paradigm in database systems, called NoDB, which do not require data loading while still maintaining the whole feature set of a modern database system. In particular, we show how to make raw data files a first-class citizen, fully integrated with the query engine. Through our design and lessons learned by implementing the NoDB philosophy over a modern DBMS, we discuss the fundamental limitations as well as the strong opportunities that such a research path brings. We identify performance bottlenecks specific for in situ processing, namely the repeated parsing and tokenizing overhead and the expensive data type conversion. To address these problems, we introduce an adaptive indexing mechanism that maintains positional information to provide efficient access to raw data files, together with a flexible caching structure. We conclude that NoDB systems are feasible to design and implement over modern DBMS, bringing an unprecedented positive effect in usability and performance.

nodb-cacm.pdf

2014

J. - G. Lee, et al., “Joins on Encoded and Partitioned Data,” Proceedings of the Very Large Databases Endowment (PVLDB), 2014.Abstract

Compression has historically been used to reduce the cost of storage, I/Os from that storage, and buffer pool utilization, at the expense of the CPU required to decompress data every time it is queried. However, significant additional CPU efficiencies can be achieved by deferring decompression as late in query processing as possible and performing query processing operations directly on the still-compressed data. In this paper, we investigate the benefits and challenges of performing joins on compressed (or encoded) data. We demonstrate the benefit of independently optimizing the compression scheme of each join column, even though join predicates relating values from multiple columns may require translation of the encoding of one join column into the encoding of the other. We also show the benefit of compressing “payload” data other than the join columns “on the fly,” to minimize the size of hash tables used in the join. By partitioning the domain of each column and defining separate dictionaries for each partition, we can achieve even better overall compression as well as increased flexibility in dealing with new values introduced by updates. Instead of decom- pressing both join columns participating in a join to resolve their different compression schemes, our system performs a light-weight mapping of only qualifying rows from one of the join columns to the encoding space of the other at run time. Consequently, join predicates can be applied directly on the compressed data. We call this procedure encoding translation. Two alternatives of encoding translation are developed and compared in the paper. We provide a comprehensive evaluation of these alternatives using product implementations of each on the TPC-H data set, and demonstrate that performing joins on encoded and partitioned data achieves both su- perior performance and excellent compression.

joinspvldb2014.pdf

G. Graefe, F. Halim, S. Idreos, H. A. Kuno, S. Manegold, and B. Seeger, “Transactional support for adaptive indexing,” The International Journal on Very Large Databases (VLDBJ), vol. 23, no. 2, pp. 303-328, 2014.Abstract

Adaptive indexing initializes and optimizes indexes incrementally, as a side effect of query processing. The goal is to achieve the benefits of indexes while hiding or minimizing the costs of index creation. However, index-optimizing side effects seem to turn read-only queries into update transactions that might, for example, create lock contention. This paper studies concurrency control and recovery in the context of adaptive indexing. We show that the design and implementation of adaptive indexing rigorously separates index structures from index contents; this relaxes constraints and requirements during adaptive indexing compared to those of traditional index updates. Our design adapts to the fact that an adaptive index is refined continuously and exploits any concurrency opportunities in a dynamic way. A detailed experimental analysis demonstrates that (a) adaptive indexing maintains its adaptive properties even when running concurrent queries, (b) adaptive indexing can exploit the opportunity for parallelism due to concurrent queries, (c) the number of concurrency conflicts and any concurrency administration overheads follow an adaptive behavior, decreasing as the workload evolves and adapting to the workload needs.

transactionsadaptiveindexing_vldbj_2014.pdf

C. Tryfonopoulos, S. Idreos, M. Koubarakis, and P. Raftopoulou, “Distributed Large-Scale Information Filtering,” Transactions on Large-Scale Data- and Knowledge-Centered Systems XIII Lecture Notes in Computer Science, vol. 13, pp. 91-122, 2014.Abstract

We study the problem of distributed resource sharing in peer-to-peer networks and focus on the problem of information filter- ing. In our setting, subscriptions and publications are specified using an expressive attribute-value representation that supports both the Boolean and Vector Space models. We use an extension of the distributed hash table Chord to organise the nodes and store user subscriptions, and utilise efficient publication protocols that keep the network traffic and latency low at filtering time. To verify our approach, we evaluate the proposed protocols experimentally using thousands of nodes, millions of user sub- scriptions, and two different real-life corpora. We also study three impor- tant facets of the load-balancing problem in such a scenario and present a novel algorithm that manages to distribute the load evenly among the nodes. Our results show that the designed protocols are scalable and efficient: they achieve expressive information filtering functionality with low message traffic and latency.

tlsdkcs_2014.pdf

2013

D. Abadi, P. Boncz, S. Harizopoulos, S. Idreos, and S. Madden, “The Design and Implementation of Modern Column-Oriented Database Systems,” Foundations and Trends in Databases, vol. 5, no. 3, pp. 197-280, 2013.Abstract

In this article, we survey recent research on column-oriented database systems, or column-stores, where each attribute of a table is stored in a separate file or region on storage. Such databases have seen a resurgence in recent years with a rise in interest in analytic queries that perform scans and aggregates over large portions of a few columns of a table. The main advantage of a column-store is that it can access just the columns needed to answer such queries. We specifically focus on three influential research prototypes, MonetDB, VectorWise, and C-Store. These systems have formed the basis for several well-known commercial column-store implementations. We describe their similarities and differences and discuss their specific architectural features for compression, late materialization, join processing, vectorization and adaptive indexing (database cracking).

columnstores.pdf

2012

S. Idreos, “Cracking Big Data,” ERCIM News. Special theme: Big Data, 2012. Publisher's Version

S. Idreos, F. Groffen, N. Nes, S. Manegold, S. K. Mullender, and M. L. Kersten, “MonetDB: Two Decades of Research in Column-oriented Database Architectures,” IEEE Data Engineering Bulletin, vol. 35, no. 1, pp. 40-45, 2012.Abstract

MonetDB is a state-of-the-art open-source column-store database management system targeting applications in need for analytics over large collections of data. MonetDB is actively used nowadays in health care, in telecommunications as well as in scientiﬁc databases and in data management research, accumulating on average more than 10,000 downloads on a monthly basis. This paper gives a brief overview of the MonetDB technology as it developed over the past two decades and the main research highlights which drive the current MonetDB design and form the basis for its future evolution.

MonetDebull2012.pdf

E. Liarou, S. Idreos, S. Manegold, and M. L. Kersten, “MonetDB/DataCell: Online Analytics in a Streaming Column-Store,” Proceedings of the Very Large Databases Endowment (PVLDB), vol. 5, no. 12, pp. 1910-1913, 2012.Abstract

In DataCell, we design streaming functionalities in a modern relational database kernel which targets big data analytics. This includes exploitation of both its storage/execution engine and its optimizer infrastructure. We investigate the opportunities and challenges that arise with such a direction and we show that it carries significant advantages for modern applications in need for online analytics such as web logs, network monitoring and scientific data management. The major challenge then becomes the efficient support for specialized stream features, e.g., multi-query processing and incremental window-based processing as well as exploiting standard DBMS functionalities in a streaming environment such as indexing. This demo presents DataCell, an extension of the MonetDB open-source column-store for online analytics. The demo gives users the opportunity to experience the features of DataCell such as processing both stream and persistent data and performing window based processing. The demo provides a visual interface to monitor the critical system components, e.g., how query plans transform from typical DBMS query plans to online query plans, how data flows through the query plans as the streams evolve, how DataCell maintains intermediate results in columnar form to avoid repeated evaluation of the same stream portions, etc. The demo also provides the ability to interactively set the test scenarios and various DataCell knobs.

DataCellVldb2012.pdf

I. Alagiannis, R. Borovica, M. Branco, S. Idreos, and A. Ailamaki, “NoDB in Action: Adaptive Query Processing on Raw Data,” Proceedings of the Very Large Databases Endowment (PVLDB), vol. 5, no. 12, pp. 1942-1945, 2012.Abstract

As data collections become larger and larger, users are faced with increasing bottlenecks in their data analysis. More data means more time to prepare the data, to load the data into the database and to execute the desired queries. Many applications already avoid using traditional database systems, e.g., scientific data analysis and social networks, due to their complexity and the increased data-to-query time, i.e. the time between getting the data and retrieving its first useful results. For many applications data collections keep growing fast, even on a daily basis, and this data deluge will only increase in the future, where it is expected to have much more data than what we can move or store, let alone analyze. In this demonstration, we will showcase a new philosophy for designing database systems called NoDB. NoDB aims at minimizing the data-to-query time, most prominently by removing the need to load data before launching queries. We will present our prototype implementation, PostgresRaw, built on top of PostgreSQL, which allows for efficient query execution over raw data files with zero initialization overhead. We will visually demonstrate how PostgresRaw incrementally and adaptively touches, parses, caches and indexes raw data files autonomously and exclusively as a side-effect of user queries.

NoDBvldb2012.pdf

F. Halim, S. Idreos, P. Karras, and R. H. C. Yap, “Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores,” Proceedings of the Very Large Databases Endowment (PVLDB), vol. 5, no. 6, pp. 502-513, 2012.Abstract

Modern business applications and scientific databases call for inherently dynamic data storage environments. Such environments are characterized by two challenging features: (a) they have little idle system time to devote on physical design; and (b) there is little, if any, a priori workload knowledge, while the query and data workload keeps changing dynamically. In such environments, traditional approaches to index building and maintenance cannot apply. Database cracking has been proposed as a solution that allows on-the-fly physical data reorganization, as a collateral effect of query processing. Cracking aims to continuously and automatically adapt indexes to the workload at hand, without human intervention. Indexes are built incrementally, adaptively, and on demand. Nevertheless, as we show, existing adaptive indexing methods fail to deliver workload-robustness; they perform much better with random workloads than with others. This frailty derives from the inelasticity with which these approaches interpret each query as a hint on how data should be stored. Current cracking schemes blindly reorganize the data within each query's range, even if that results into successive expensive operations with minimal indexing benefit. In this paper, we introduce stochastic cracking, a significantly more resilient approach to adaptive indexing. Stochastic cracking also uses each query as a hint on how to reorganize data, but not blindly so; it gains resilience and avoids performance bottlenecks by deliberately applying certain arbitrary choices in its decision-making. Thereby, we bring adaptive indexing forward to a mature formulation that confers the workload-robustness previous approaches lacked. Our extensive experimental study verifies that stochastic cracking maintains the desired properties of original database cracking while at the same time it performs well with diverse realistic workloads.

StochasticCrackingPVLDB12.pdf

G. Graefe, F. Halim, S. Idreos, H. A. Kuno, and S. Manegold, “Concurrency Control for Adaptive Indexing,” Proceedings of the Very Large Databases Endowment (PVLDB), vol. 5, pp. 656-667, 2012.Abstract

Adaptive indexing initializes and optimizes indexes incrementally, as a side effect of query processing. The goal is to achieve the benefits of indexes while hiding or minimizing the costs of index creation. However, index-optimizing side effects seem to turn read-only queries into update transactions that might, for example, create lock contention. This paper studies concurrency control in the context of adaptive indexing. We show that the design and implementation of adaptive indexing rigorously separates index structures from index contents; this relaxes the constraints and requirements during adaptive indexing compared to those of traditional index updates. Our design adapts to the fact that an adaptive index is refined continuously, and exploits any concurrency opportunities in a dynamic way. A detailed experimental analysis demonstrates that (a) adaptive indexing maintains its adaptive properties even when running concurrent queries, (b) adaptive indexing can exploit the opportunity for parallelism due to concurrent queries, (c) the number of concurrency conflicts and any concurrency administration overheads follow an adaptive behavior, decreasing as the workload evolves and adapting to the workload needs.

ConcurrencyControlForAdaptiveIndexingPVLDB2012.pdf

Stratos Idreos

Gordon McKay Professor of Computer Science

Publications by Type: Journal Article

Pages