The Log-Structured Merge-Bush & the Wacky Continuum

Citation:

N. Dayan and S. Idreos, “The Log-Structured Merge-Bush & the Wacky Continuum,” in ACM SIGMOD International Conference on Management of Data, 2019.
wackyandthebush.pdf1.52 MB

Abstract:

Data-intensive key-value stores based on the Log-Structured Merge-Tree are used in numerous modern applications ranging from social media and data science to cloud infrastructure. We show that such designs exhibit an intrinsic contention be- tween the costs of point reads, writes and memory, and that this trade-off deteriorates as the data size grows. The root of the problem is that in all existing designs, the capacity ratio between any pair of levels is fixed. This causes write cost to increase with the data size while yielding exponentially diminishing returns for point reads and memory.

We introduce the Log-Structured Merge-Bush (LSM-Bush), a new data structure that sets increasing capacity ratios between adjacent pairs of smaller levels. As a result, smaller levels get lazier by gathering more runs before merging them. By using a doubly-exponential ratio growth rate, LSM-bush brings write cost down from O(log N ) to O(log log N ), and it can trade this gain to either improve point reads or memory. Thus, it enables more scalable trade-offs all around.

We further introduce Wacky, a design continuum that includes LSM-Bush as well as all state-of-the-art merge policies, from laziest to greediest, and can assume any of them within a single implementation. Wacky encompasses a vast space of performance properties, including ones that favor range reads, and it can be searched analytically to find the design that performs best for a given workload in practice.