Big Data Exploration


S. Idreos, “Big Data Exploration,” in Big Data Computing, Taylor and Francis, 2013.
BigDataExploration.pdf222 KB


We are now entering the era of data deluge, where the amount of data outgrows the capabilities of query processing technology. Many emerging applications, from social networks to scientific experiments, are representative examples of this deluge, where the rate at which data is produced exceeds any past experience. For example, scientific analysis such as astronomy is soon expected to collect multiple Terabytes of data on a daily basis, while already web-based businesses such as social networks or web log analysis are confronted with a growing stream of large data inputs. Therefore, there is a clear need for efficient big data query processing to enable the evolution of businesses and sciences to the new era of data deluge. In this chapter, we focus on a new direction of query processing for big data where data exploration becomes a first class citizen. Data exploration is necessary when new big chunks of data arrive rapidly and we want to react quickly, i.e., with little time to spare for tuning and set-up. In particular, our discussion focuses on database systems technology, which for several decades has been the predominant data processing tool. In this chapter, we introduce the concept of data exploration and we discuss a series of early techniques from the database community towards the direction of building database systems which are tailored for big data exploration, i.e., adaptive indexing, adaptive loading and sampling-based query processing. These directions focus on reconsidering fundamental assumptions and on designing next generation database architectures for the big data era.