In most ways, Data Governance of Big Data is not different from normal Data Governance. The benefits are the same. The reasons for doing it are the same. And, mostly, what needs to be done is the same.
What is different about Big Data Governance is that it’s about more data types, more sophisticated tools are needed, and the need for more metadata is critical.
First of all, Big Data Governance requires performing governance over many different types of data, not just what’s in relational databases. Certainly, the scope needs to include non-relational databases and unstructured data and documents. This itself may require new tools to deal with these other technologies.
Secondly (and maybe this should be first because it is about data volumes), more sophisticated tools are needed to assess and profile data. Big Data volumes are beyond human manageable scale and the traditional approaches of profiling and managing data primarily through observation becomes unfeasible.
Thirdly, the importance of collecting and documenting metadata becomes critical in order to automate as much as possible of the Data Governance activities. This item is tied with the one above, in that more sophisticated tools can help to infer the metadata of the relationships between the data, and metadata is required to automate the monitoring activities.
In summary, the strategic reasons for doing Data Governance remain the same and the way a Big Data organization is structured, but how the Data Governance of Big Data is actually performed may be very different.