P. Bonnet, et al., “
Repeatability and workability evaluation of SIGMOD 2011,”
SIGMOD Record, vol. 40, pp. 45-48, 2011.
AbstractSIGMOD has offered, since 2008, to verify the experiments published in the papers accepted at the conference. This year, we have been in charge of reproducing the experiments provided by the authors (repeatability), and exploring changes to experiment parameters (workability). In this paper, we assess the SIGMOD repeatability process in terms of participation, review process and results. While the participation is stable in terms of number of submissions, we find this year a sharp contrast between the high participation from Asian authors and the low participation from American authors. We also find that most experiments are distributed as Linux packages accompanied by instructions on how to setup and run the experiments. We are still far from the vision of executable papers.
SIGMOD2011Repeatability.pdf M. L. Kersten, S. Idreos, S. Manegold, and E. Liarou, “
The Researcher's Guide to the Data Deluge: Querying a Scientific Database in Just a Few Seconds,”
Proceedings of the Very Large Databases Endowment (PVLDB), vol. 4, no. 12, pp. 1474-1477, 2011.
AbstractThere is a clear need nowadays for extremely large data processing.
This is especially true in the area of scientific data management where soon we expect
data inputs in the order of multiple Petabytes.
However, current data management technology is not suitable for such data sizes.
In the light of such new database applications, we can rethink some of the strict
requirements database systems adopted in the past.
We argue that correctness is such a critical property, responsible for performance degradation.
In this paper, we propose a new paradigm towards building database kernels
that may produce wrong but fast, cheap and indicative results.
Fast response times is an essential component of data analysis for exploratory applications;
allowing for fast queries enables
the user to develop a ``feeling" for the data through a series of ``painless" queries which eventually leads
to more detailed analysis in a targeted data area.
We propose a research path where a database kernel autonomously and on-the-fly
decides to reduce the processing requirements of a running query
based on workload, hardware and
environmental parameters.
It requires a complete redesign of database operators
and query processing strategy.
For example, typical and very common scenarios were query processing performance degrades significantly
are cases where a database operator has to spill data
to disk, or is forced to perform random access, or has to follow long linked lists, etc.
Here we ask the question: What if we simply avoid these steps, ``ignoring" the side-effect
in the correctness of the result?
TheResearchersGuidetoPVLDB11.pdf S. Idreos, S. Manegold, H. A. Kuno, and G. Graefe, “
Merging What's Cracked, Cracking What's Merged: Adaptive Indexing in Main-Memory Column-Stores,”
Proceedings of the Very Large Databases Endowment (PVLDB), vol. 4, pp. 585-597, 2011.
AbstractAdaptive indexing is characterized by the partial creation
and refinement of the index as side effects of query execution.
Dynamic or shifting workloads may benefit from preliminary
index structures
focused on the columns and specific key ranges actually queried ---
without incurring the cost of full index construction.
The costs and benefits of adaptive indexing techniques
should therefore be compared in terms of initialization costs,
the overhead imposed upon queries, and the rate at which
the index converges to a state that is fully-refined
for a particular workload component.
Based on an examination of database cracking and adaptive merging,
which are two techniques for adaptive indexing,
we seek a hybrid technique that has a low initialization cost
and also converges rapidly.
We find the strengths and weaknesses of database cracking and
adaptive merging complementary.
One has a relatively high initialization cost but converges rapidly.
The other has a low initialization cost but converges relatively slowly.
We analyze the sources of their respective strengths and
explore the space of hybrid techniques.
We have designed and implemented a family of hybrid algorithms
in the context of a column-store database system.
Our experiments compare their behavior against
database cracking and adaptive merging, as well as against
both traditional full index lookup and scan of unordered data.
We show that the new hybrids significantly improve over
past methods while at least two of the hybrids
come very close to the ``ideal performance''
in terms of both overhead per query and convergence to a final state.
AdaptiveIndexingPVLDB11.pdf