Thoughts About Big Data
by Bill Wells, MS, CISSP, CISM, CISA, CRISC, CIPP/IT
Businesses have begun to formulate and adopt a strategy for leveraging what the industry has coined as “Big Data”. If you ask people to define Big Data, you get a variety of definitions. One definition that has emerged in recent months seems to be showing the signs of having some staying power; that it is data that is characterized by variety, volume and velocity.
At the risk of point out the painfully obvious, Big Data’s volume dimension describes the vast bulk of information that gets generated today. The challenge of this dimension lies in figuring out an effective way to store all that information.
The variety dimension of Big Data takes into account the fact that not all data comes in a readily usable format. In one sense, Big Data looks like a collection of MS Office documents, in another an XML data stream, and in yet another a structured relational database. The combination of these types, and many others, constrains the ability to perform meaningful analytics against it.
The velocity dimension refers how quickly data is generated. Practically every web page, application and database has been instrumented with data collection mechanisms. From real-time data streams to web-based click streams to frequency counts of how often a particular book was purchased. The rate at which we create data today challenges companies’ ability to consume it.
So put these three dimensions together and you end up with a lot of data coming at you in a variety of formats in a speed that quickly overwhelms analytic engines’ ability to categorize, process and store it. So now we know what it looks like, but how do companies put it to use?
A few poster-child examples of Big Data usage are Amazon.com and Facebook. When you buy a book on Amazon, there’s an area of the website that shows you what other people who bought that book also purchased. Facebook uses the “Like” feature and information from people’s profiles to present more relevant advertisements. Through the creation and inspection of the browser “cookies” that are created when you visit a website, companies are able to tailor the services they offer.
Now add the idea of metadata to this concept and suddenly the usefulness of Big Data expands exponentially. Metadata is data that describes other data. A database record that holds name and address information, for example, is also described by the number of times it was accessed, who accessed them and when. By examining patterns generated from metadata, one can quickly begin to determine and predict the most probable future behaviors.
Tying all of this together for those of us in the information risk, assurance and auditing world, Big Data puts new responsibilities on companies as they begin to embark on their Big Data journeys. Carelessness with Big Data can lead to the types of disclosures that will make simple disclosure of a social security or credit card number look like small potatoes compared to disclosing someone’s geolocomotion, shopping and health care treatment behaviors.
Are the data aggregation and analytics systems sufficiently protected? Do appropriate access and segregation of duties controls exist? Where we were once satisfied with knowing who accessed what and when, we now need to start understanding why. Metadata about phone calls and emails, for example, could be used to identify forthcoming mergers and acquisitions. What seems like a relatively benign data collection activity can suddenly turn into a privacy controls nightmare when that data is aggregated and analyzed under a Big Data analytics model.