Abstract:
We are now entering the era of data deluge, where the
amount of data outgrows the capabilities of query processing technology.
Many emerging applications, from social networks to scientific experiments,
are representative examples of this deluge, where the rate at which data is
produced exceeds any past experience. For example, scientific analysis such
as astronomy is soon expected to collect multiple Terabytes of data on a daily
basis, while already web-based businesses such as social networks or web
log analysis are confronted with a growing stream of large data inputs.
Therefore, there is a clear need for efficient big data query processing to
enable the evolution of businesses and sciences to the new era of data
deluge.
In this chapter, we focus on a new direction of query processing for big data
where data exploration becomes a first class citizen. Data exploration is
necessary when new big chunks of data arrive rapidly and we want to react
quickly, i.e., with little time to spare for tuning and set-up. In particular, our
discussion focuses on database systems technology, which for several
decades has been the predominant data processing tool.
In this chapter, we introduce the concept of data exploration and we discuss a
series of early techniques from the database community towards the direction
of building database systems which are tailored for big data exploration, i.e.,
adaptive indexing, adaptive loading and sampling-based query processing.
These directions focus on reconsidering fundamental assumptions and on
designing next generation database architectures for the big data era.