S. Idreos, “
Big Data Exploration,” in
Big Data Computing, Taylor and Francis, 2013.
AbstractWe are now entering the era of data deluge, where the
amount of data outgrows the capabilities of query processing technology.
Many emerging applications, from social networks to scientific experiments,
are representative examples of this deluge, where the rate at which data is
produced exceeds any past experience. For example, scientific analysis such
as astronomy is soon expected to collect multiple Terabytes of data on a daily
basis, while already web-based businesses such as social networks or web
log analysis are confronted with a growing stream of large data inputs.
Therefore, there is a clear need for efficient big data query processing to
enable the evolution of businesses and sciences to the new era of data
deluge.
In this chapter, we focus on a new direction of query processing for big data
where data exploration becomes a first class citizen. Data exploration is
necessary when new big chunks of data arrive rapidly and we want to react
quickly, i.e., with little time to spare for tuning and set-up. In particular, our
discussion focuses on database systems technology, which for several
decades has been the predominant data processing tool.
In this chapter, we introduce the concept of data exploration and we discuss a
series of early techniques from the database community towards the direction
of building database systems which are tailored for big data exploration, i.e.,
adaptive indexing, adaptive loading and sampling-based query processing.
These directions focus on reconsidering fundamental assumptions and on
designing next generation database architectures for the big data era.
BigDataExploration.pdf S. Idreos and E. Liarou, “
dbTouch: Analytics at your Fingertips,” in
Proceedings of the 7th International Conference on Innovative Data Systems Research (CIDR), Asilomar, California, 2013.
AbstractAs we enter the era of data deluge,
turning data into knowledge has become the major challenge across most sciences and businesses that deal with data.
In addition, as we increase our ability to create data, more and more people are confronted
with data management problems on a daily basis for numerous aspects of every day life.
A fundamental need is data exploration through interactive tools, i.e.,
being able to quickly and effortlessly determine data and patterns of interest.
However, modern database systems have not been designed with data exploration and usability in mind;
they require users with expert knowledge and skills,
while they react in a strict and monolithic way to every user request, resulting in correct answers but slow response times.
In this paper, we introduce the vision of a new generation of data management systems, called dbTouch;
our vision is to enable interactive and intuitive data exploration via database kernels
which are tailored for touch-based exploration.
No expert knowledge is needed.
Data is represented in a visual format, e.g., a column shape for an attribute or a fat rectangle shape for a table,
while users can touch those shapes and interact/query with gestures as opposed to firing complex SQL queries.
The system does not try to consume all data; instead it analyzes only parts of the data at a time,
continuously refining the answers and continuously reacting to user input.
Every single touch on a data object can be seen as a request to run an operator or a collection of operators
over part of the data. Users react to running results and continuously adjust the data exploration -
they continuously determine the data to be processed next
by adjusting the direction and speed of a gesture, i.e., a collection of touches;
the database system does not have control on the data flow anymore.
We discuss the various benefits that dbTouch systems bring for data analytics as well as the
new and unique challenges for database research in combination with touch interfaces.
In addition, we provide an initial architecture, implementation and evaluation (and demo) of a dbTouch prototype
over IOs for IPad.
dbTouchCIDR13.pdf E. Liarou, S. Idreos, S. Manegold, and M. L. Kersten, “
Enhanced Stream Processing in a DBMS Kernel,” in
Proceedings of the 16th International Conference on Extending Database Technology (EDBT), Genoa, Italy, 2013, pp. 501-512.
AbstractContinuous query processing has emerged as
a promising query processing paradigm with numerous applications.
A recent development is the need to handle both streaming queries and typical one-time queries
in the same application. For example, data warehousing can greatly benefit from the
integration of stream semantics, i.e., online analysis of incoming data and combination with existing data.
This is especially useful to provide low latency in data-intensive analysis
in big data warehouses that are augmented with new data on a daily basis.
However, state-of-the-art database technology cannot handle streams efficiently due to their
``continuous" nature.
At the same time, state-of-the-art stream technology is purely focused on stream applications.
The research efforts are mostly geared towards the creation of specialized stream management systems
built with a different philosophy than a DBMS.
The drawback of this approach is the limited opportunities to exploit
successful past data processing technology, e.g., query optimization techniques.
For this new problem we need to combine the best of both worlds.
Here we take a completely different route by designing a
stream engine on top of an existing relational database kernel.
This includes reuse of both its storage/execution engine and its optimizer
infrastructure. The major challenge then becomes the efficient support
for specialized stream features.
This paper focuses on incremental window-based processing, arguably the most crucial
stream-specific requirement.
In order to maintain and reuse the generic storage and execution model of the DBMS,
we elevate the problem at the query plan level.
Proper optimizer rules, scheduling and intermediate result caching and reuse,
allow us to modify the DBMS query plans for efficient incremental processing.
We describe in detail the new approach and we demonstrate
efficient performance even against specialized stream engines, especially when scalability
becomes a crucial factor
DataCellEdbt2013.pdf