I originally posted this blog in May 2013 on another blog site but it’s still relevant.
Drivers of Big Data Governance
What are the major drivers behind Big Data Governance? Both Big Data and Data Governance are very hot topics. Most organizations are implementing a Data Governance program, but these programs tend to be focused on governing the data in relational databases even though unstructured data vastly outnumbers structured data and the projections of ever increasing volumes seems overwhelming.
Governing unstructured data in documents is another way of referencing “Enterprise Content Management (ECM),” an area of Data Management with mature tools. Management at organizations is frustrated with the current implementations of many of their data management programs, including Enterprise Content Management. Frustration with results from data management programs such as Data Warehousing, Business Intelligence, and Master Data Management, has been the primary driver of the implementation of Data Governance programs at many of these organizations. Similarly, organizations are not getting the expected results from their Enterprise Content Management programs, frequently instantiated with Microsoft SharePoint , because they have not included sufficient Data Governance in the implementations. As has been always the case, implementing a technology solution without sufficient business process involvement, rarely ends with the anticipated benefits.
Although the cost of storage has gone down dramatically, the rise in the amount of data that organizations wish to retain has risen even faster. Governance is needed to establish the retention policies on all types of data (what data to store, when to archive, when to delete) in order to manage the cost of the storage of all this data and to enable important information to be retrieved. IT management is particularly focused on cost of data storage.
All parts of an organization would like to be able to retrieve their desired information from stored data but the legal department is particularly focused on this driver. Legal would like to be able to point to their data retention policies and say definitively that certain data is no longer retained or to be able to find data quickly and minimize the cost and duration of the search for data requested by regulatory bodies.
Tagging and filing unstructured data to aid in later retrieval and determination must move from manual to automated processes because the volumes involved in Big Data are beyond human capabilities to manage and hand crafted solutions are no longer effective, if they ever were. Data analysis also needs to move beyond abilities to analyze data either structured (business intelligence) or unstructured (search) to utilize cross data types search and analysis. See my blog on Master Data in Big Data Management for more on some ideas for linking together structured and unstructured data and automation techniques.