Everyone will be a data-scientist

As I start a new adventure as a faculty member in the Harvard School of Engineering and Applied Sciences I am also starting a new blog. This blog will be mainly about data systems research and academic life.

The main driving force in my research is my belief that "everyone will be a data scientist", i.e., that data will be everywhere and that we have to rethink research goals and priorities to make it easy to navigate in a data driven world. How far are we from a future where a data system sits in the critical path of nearly every aspect of our life? I think not far away. As we keep increasing our ability to collect and share data and as more and more data-driven applications emerge, such a future becomes more and more a reality. What this means is that more and more businesses, sciences but also everyday tasks will rely on data management solutions. Ideally, we would like those sciences, businesses, etc. to be able to focus on their creative part and to not have to worry about how to store, access, and move data. But as it stands today, database systems are too complex; setting up and maintaining a state-of-the-art database system for a given application is a kind of an art which is done by highly paid database administrators (the kind that small businesses and sciences cannot afford). Even choosing which data system to use is a major issue as different systems will be good for different scenarios and as data-driven scenarios become more and more dynamic a single system is not good enough.

My research plan is towards designing the backbone for the data systems of the future that will be easy to use by non-experts, yet still providing a rich set of features, being able to adapt to usage and workload patterns. The slide above is from a recent talk I gave at the Gong show of the 2013 High Performance Transaction Systems (HPTS) meeting; it tries to depict in a humorous way the need for a new class of data systems that require minimum tuning and expertise to use.

In short, future data systems should simply work, even as hardware, query and data workloads evolve. Individuals should not need to have a PhD in databases to get the power of data analytics while businesses should not need to spend a  fortune in an army of database administrators. 

I am also a big believer in that data management research (and computer science in general) can play a critical role in the scientific breakthroughs of the future.