Misguided article. It’s not about how big the data is or how good the algorithms fed data are but about where it comes from and what value it contains. Data is best coming from the individual as that’s the only source of context and understanding. Big data is a red herring, customer enhanced and shared data is the real deal. Unfortunately, that’s not on the horizon any time soon… first of all, the organisations getting excited about big data would need to free their customers’ data. And that’s going to be hard work.
“What are we to conclude from these three areas — all of them problems with fine, highly motivated minds focused on them? To me, they suggest that the randomness inherent in human behavior is the limiting factor to consumer modeling success. Marginal gains can perhaps be made thanks to big data, but breakthroughs will be elusive as long as human behavior remains inconsistent, impulsive, dynamic, and subtle.”
In other words, it takes two to tango.
A couple of things have happened in the last few years, which have made me realise I’m rapidly turning into an “Old Fart”, having seen them come around before; the first of these is touch interfaces (to which I say “if you don’t remember the ergonomics issues with lightpens in the ’70s and ‘gorilla arm’, look them up before putting touchscreens on laptops or other things less readily adjustable than tablets”), and the second is Big Data.
While it wasn’t called “Big Data” back then, I can’t help but think of AI research in the ’80s involving neural networks and pattern matching in image recognition. The problem with neural networks was that – except possibly for those built on Adaptive Resonance Theory – you couldn’t “lift the lid” during the training phase to examine or modify what it was that they were actually learning how to do. Therefore, you ended up with neural networks which were supposed to be able to identify a target from a photograph of a tank concealed in a wood, but which ended up ignoring the tank entirely, and being very good at distinguishing between different kinds of tree.
So, applying a similarly jaded and cynical view of Big Data, thoughts like these arise:
* data mining to identify patterns. Are the patterns really there, once both random and systematic errors in data gathering and measurement are taken into account? Are the patterns likely to be down to sample size (you can still get reduction to mean in large samples, for example)? Is the data sufficiently unbiased, that patters derived from it could be generalised (and if so, how far could the generalisation be done)? Do the patterns “make sense” (while working back from a set of correlated but not obviously-related effects to a mutual cause is a highly fraught and speculative exercise, would the result of such an exercise be realistic)?
* data reduction (MapReduce, etc). Can you be sure that the data being removed is truly insignificant?