home | tags | mulvany.net

This is what we see on a post page in /post/single.html

What do we mean when we talk about Big Data?

Thu Dec 15, 2016

426 Words
Posted In: big data, research, social science, computational social science

What do we mean when we talk about Big Data?

The following blog post about this article provides the following definition of big data:

“High volume data that frequently combines highly structured administrative data actively collected by public sector organisations with continuously and automatically collected structured and unstructured real-time data that are often passively created by public and private entities through their internet.”

The article is behind a paywall, but the blog is pretty clearly laid out. The authors seem mostly concerned about how the term big data is used by researchers who are mostly coming from a background of working with public sector data.

My takeaways from the blog post are:

* public sector use of the term *Big Data* is sometimes divergent from what the term means in the private sector 
* real time data collection *could* be a vice in the public sector 
* Digital exhaust data is only coincidentally aligned with having any utility for answering public policy questions, and given that is it at all suitable for such purposes? 
* The ethics of the use of this data are unclear 
* This kind of present Big Data is not representative of our full lives, nor representative of all citizens 
* There remains great potential, but excitement around this potential must be tempered with an understanding of the current inherent limitations of this resource. 

I think these are all reasonable positions to take at the moment, however the definition of big data leaves open how we might interpret what high volume means.

A position I’m coming to about big data is that it’s mostly around how comfortable you feel with the data, and that one person’s big data is another’s batch job. What the explosion of data has created is an increase in the number of occasions where a particular researcher will hit against the limits of what is technically possible to them, at that moment in time. Setting aside all of the questions about what is in the underlying data, and how well it may or may not be a good fit for the research question being asked, what I find very exciting is that the journey of gaining the capacity to work with the data that you think is big today is one which will create a cohort of researchers who are unafraid to also deal with what may be big for them tomorrow. In this way we create an environment of fantastically skilled researchers, who are potentially in a better position to tackle hard problems than they are today.