[Q]uantifying the amount of information that exists in the world is hard. What is clear is that there is an awful lot of it, and it is growing at a terrific rate (a compound annual 60%) that is speeding up all the time. The flood of data from sensors, computers, research labs, cameras, phones and the like surpassed the capacity of storage technologies in 2007.
90% of the data in the world today has been created in the last two years alone. Some estimate that data production will be 44 times greater in 2020 than it was in 2009. Others estimate an additional 2.5 quintillion bytes of data is being generated every day.
Estimates show that the amount of data in the world doubles every two years. Should this trend continue, by 2020 there would be 500 times the amount of data as existed in 2011.
Big data is big in two different senses. It is big in the quantity and variety of data that are available to be processed. And, it is big in the scale of analysis (termed "analytics") that can be applied to those data, ultimately to make inferences and draw conclusions. By data mining and other kinds of analytics, non‐obvious and sometimes private information can be derived from data that, at the time of their collection, seemed to raise no, or only manageable, privacy issues. Such new information, used appropriately, may often bring benefits to individuals and society. Even in principle, however, one can never know what information may later be extracted from any particular collection of big data, both because that information may result only from the combination of seemingly unrelated data sets, and because the algorithm for revealing the new information may not even have been invented at the time of collection.
What really matters about big data is what it does. Aside from how we define big data as a technological phenomenon, the wide variety of potential uses for big data analytics raises crucial questions about whether our legal, ethical, and social norms are sufficient to protect privacy and other values in a big data world. Unprecedented computational power and sophistication make possible unexpected discoveries, innovations, and advancements in our quality of life. But these capabilities, most of which are not visible or available to the average consumer, also create an asymmetry of power between those who hold the data and those who intentionally or inadvertently supply it.
The same data and analytics that provide benefits to individuals and society if used appropriately can also create potential harms — threats to individual privacy according to privacy norms both widely shared and personal. For example, large‐scale analysis of research on disease, together with health data from electronic medical records and genomic information, might lead to better and timelier treatment for individuals but also to inappropriate disqualification for insurance or jobs. GPS tracking of individuals might lead to better community‐based public transportation facilities, but also to inappropriate use of the whereabouts of individuals.
Part of the challenge, too, lies in understanding the many different contexts in which big data comes into play. Big data may be viewed as property, as a public resource, or as an expression of individual identity. Big data applications may be the driver of America's economic future or a threat to cherished liberties. Big data may be all of these things.
A common framework for characterizing big data relies on the "three Vs," the volume, velocity, and variety of data, each of which is growing at a rapid rate as technological advances permit the analysis and use of this data in ways that were not possible previously.
Velocity is the speed with which companies can accumulate, analyze, and use new data. Technological improvements allow companies to harness the predictive power of data more quickly than ever before, sometimes instantaneously.
Variety means the breadth of data that companies can analyze effectively. Companies can now combine very different, once unlinked, kinds of data — either on their own or through data brokers or analytics firms — to infer consumer preferences and predict consumer behavior, for example.
Together, the three Vs allow for more robust research and correlation. Previously, finding a representative data sample sufficient to produce statistically significant results could be very difficult and expensive. Today, the present scope and scale of data collection enables cost-effective, substantial research of even obscure or mundane topics (e.g., the amount of foot traffic in a park at different times of day)."
↑Gartner, "The Importance of 'Big Data': A Definition" (full-text).
↑Steve Lohr, "New U.S. Research Will Aim at Flood of Digital Data," N.Y. Times (Mar. 29, 2012) (full-text).
↑James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh & Angela Hung Byers, McKinsey Global Institute, "Big Data: The Next Frontier for Innovation, Competition and Productivity," Executive Summary 1 (May 2011) (full-text).
"Data, Data Everywhere, A Special Report on Managing Information," The Economist (Feb. 25, 2010) (full-text).
"Dealing with Data," Science (special issue) (Feb. 11, 2011) (full-text).
Robert Kirkpatrick, "Beyond Targeted Ads: Big Data for a Better World" (2012) (full-text).
Jules Polonetsky & Omer Tene, "Privacy and Big Data: Making Ends Meet," 66 Stan. L. Rev. Online 25 (2013) (full-text).
Edith Ramirez, "The Privacy Challenges of Big Data: A View from the Lifeguard's Chair," Keynote Address by FTC Chairwoman Edith Ramirez (Technology Policy Institute Aspen Forum) (Aug. 19, 2013) (full-text).