Securing Big Data – The Scoop on Hadoop (Part 2)
In the last post, we discussed big data relevancy and growing adoption in the enterprise through a security lens, touching on one of several critical security concerns that this maturing technology presents. Today, I’ll flesh out the additional areas that we as security leaders need to be aware.
Let’s start with the areas where we have existing solutions:
- As nodes in the hadoop cluster store data, that information needs to be protected in its at-rest state. Local data encryption via current day SAN technology or direct attached storage is viable from both a performance and administrative perspective.
- Access to node data by the local administrators is a fundamental issue for any distributed cluster environment. Though good control and design is the fix, it is less a technology solution and more a people and process resolve.
- Node and API authentication are concerns due to the limitations and inherit vulnerabilities in the out-of-the-box Kerberos implementations. Rouge node insertion and stolen ticket issues are at the top of the list for holes that can be exploited.
Let’s move to the area where we need to look to new solutions to solve the big data security issue.
- Near real-time logging, monitoring, filtering, blocking, and auditing for all access types and transitions are yet other functions that are not part of current day hadoop distributions. For logging and auditing, several factors create challenges in this area including policy modeling, appropriate access definitions / profiles, and integration with current SIEM. To effectively monitor filter and block, the solution must tie directly to the usage profiles as well as have full automation links to network, storage, and logical security controls. The idea of manual adjustment of security control point tolerances has to go the way of the dodo bird and fast. Lastly we have to consider that the data size and transact / event volume is at a scale that is exponentially greater than we have ever dealt with before.
80%+ of today’s environments across the Fortune 1000 have a current breach underway… they just don’t know it yet
There are several tactical solutions for the issues called out here but there are two overriding architectural concerns that require a long term solution. The first is an issue of scale given the sheer volume of information that must be secured across each of the facets described above. The second is a legacy issue of existing breaches; said differently, 80%+ of today’s environments across the Fortune 1000 have a current breach underway… they just don’t know it yet. This drastic growth of big data information stores both inside and outside of the typical perimeters will amplify that breach percentage.
This drastic growth of big data information stores both inside and outside of the typical perimeters will amplify that breach percentage.
If you are interested in how EMC is solving this all too real issue for its customers, tune in on my next post where we’ll discuss using big data to secure big data.
At the risk of repeating myself…Trust me… this is not your father’s data warehouse.