Cam Davidson-Pilon has an excellent blog post about 21st Century Problems. In it, he posits one of the best explanations of the promise of Big Data I’ve yet to come across:
21st Century problems are statistical problems
Statistical problems describe the space we haven’t explored yet. Statistical problems are not new: they are likely as old as deterministic problems. What is new is our ability to solve them. Spear-headed by the (constantly increasing) tidal wave of data, practitioners are able to solve new problems otherwise thought impossible. Consider the development of a spellchecker: in a deterministic approach, an algorithm for spell checking would have needed to incorporate context and complicated ideas from the language’s grammar (I shutter at the nested
ifstatements ), unique only up to that language; whereas a statistical approach can be written in under 20 lines. The difference between the two approaches is that the latter has taken advantage of the presence of a large corpus of text — a very lenient assumption.
This isn’t another big data article, but its hard underestimate, let along imagine, what we will be doing with these casual data sets. Fields like medicine, that previously relied on small sample sizes to make important one-size-fits-all decisions, will evolve into a very personal affair. By investigating traffic data, dynamic solutions can be built that mimic past successes. Aided by machine learning, specifically recommendation engines, companies can invoke desires never previously thought about in our minds. Ideas like multi-armed bandits will motivate UI and AI development.
Consider Big Data in that context and suddenly it’s a far more powerful (and complex) idea than what a few whitepapers might have you believe.
Also, for contrast purposes, note how he describes most of the great technological accomplishments of the 20th century:
The technological challenges, and achievements, of the 20th Century handed society powerful tools. Technologies like nuclear power, airplanes & automobiles, the digital computer, radio, internet and imaging technologies to name only a handful. Each of these technologies had disrupted the system, and each can be argued to be Black Swans (à la Nassim Taleb). In fact, for each technology, one could find a company killed by it, and a company that made its billions from it.
What these technologies have in common is that are all deterministic engineering solutions. By that, I mean they have been created by techniques in mathematics, physics and engineering: often being modeled in a mathematical language, guided by physics’ calculus and constrained and brought to life by engineering. I argue that these types of problems, of modeling deterministically, are problems that our father’s had the luxury of solving.
Very smart analysis, and one I haven’t read before. Check out Cam’s whole post.