One of the most common arguments in the area of Business Intelligence today concerns the level of detail at which a company should keep its data for the purpose of informational analysis. There are many reasons for this argument, and perhaps the most pertinent three are influenced by:
- What data is available
- The limitations of current technology to support differing volumes of data
- Existing and pending data protection laws and ownership issues
The most restrictive of the above list is the third, which is governed by the fact that differing legislation in different countries regulates both the amount of data one can hold over history and the level of detail of this data, most especially if it can be used to track the behavior of individuals. This issue of privacy is of utmost importance and will have ramifications across many aspects of decision-making.
The second issue in our list is one of technology, and it’s fair to say that the state in which we find ourselves today is one in which current technology gives us the ability to store and inquire on huge (if not limitless) amounts of data. The pertinent discussion should center not around ‘can we do it?’ but rather what business value can be obtained from differing ‘levels’ of data, and this subject is the major theme of this post.
The remaining issue – what data is available – is obviously an important factor simply because you can’t keep and use what you don’t have in the first place. In fact, understanding the gap between available data and that which is required is a key objective in requirement definition and must primarily be a business-led activity, not only in defining important and missing data items, but also putting a worth on capturing them.
Nowadays of course, companies are collecting data from many sources but looking back to what might soon be considered ‘the good old days’ things were a little easier. We had internally generated data and the Data Warehouse. Now some of these were in fact BIG especially when they were telco companies so it’s worth looking at what this data is.
Perhaps more data is collected when you use your mobile (or wire line telephone) than with any other sort of transaction. The Call Detail Record (CDR) which is generated by the telephone switches on a per-call basis (broadly speaking) is a great source of information for the telephone company in terms of analytical potential. For example, for mobile calls, the CDR contains:
- calling number
- called number
- time of call
- duration of call
- location of caller and called
- termination codes
Imagine 20 million subscribers making 30 calls a day, browsing the web and sending SMS’s and it’s probably right to think that commercial ‘big data’ started here.
Also, I want to let folks know that my colleague the dean of Big Data Bill Schmarzo will be speaking at the TDWI conference in Chicago on May 8th. The title of his presentation is “Navigating the Road from Business Intelligence to Data Science” and he will be presenting from 11:00 to noon.