Big Data Governance – Part 1 – Why Do You Govern Data Outside of Databases?

April 12, 2016

I originally posted this blog in May 2013 on another blog site but it’s still relevant.

Drivers of Big Data Governance

What are the major drivers behind Big Data Governance?  Both Big Data and Data Governance are very hot topics.  Most organizations are implementing a Data Governance program, but these programs tend to be focused on governing the data in relational databases even though unstructured data vastly outnumbers structured data and the projections of ever increasing volumes seems overwhelming. 

Governing unstructured data in documents is another way of referencing “Enterprise Content Management (ECM),” an area of Data Management with mature tools. Management at organizations is frustrated with the current implementations of many of their data management programs, including Enterprise Content Management.  Frustration with results from data management programs such as Data Warehousing, Business Intelligence, and Master Data Management, has been the primary driver of the implementation of Data Governance programs at many of these organizations.  Similarly, organizations are not getting the expected results from their Enterprise Content Management programs, frequently instantiated with Microsoft SharePoint , because they have not included sufficient Data Governance in the implementations.  As has been always the case, implementing a technology solution without sufficient business process involvement, rarely ends with the anticipated benefits.

Although the cost of storage has gone down dramatically, the rise in the amount of data that organizations wish to retain has risen even faster.  Governance is needed to establish the retention policies on all types of data (what data to store, when to archive, when to delete) in order to manage the cost of the storage of all this data and to enable important information to be retrieved.  IT management is particularly focused on cost of data storage. 

All parts of an organization would like to be able to retrieve their desired information from stored data but the legal department is particularly focused on this driver.  Legal would like to be able to point to their data retention policies and say definitively that certain data is no longer retained or to be able to find data quickly and minimize the cost and duration of the search for data requested by regulatory bodies.

Tagging and filing unstructured data to aid in later retrieval and determination must move from manual to automated processes because the volumes involved in Big Data are beyond human capabilities to manage and hand crafted solutions are no longer effective, if they ever were. Data analysis also needs to move beyond abilities to analyze data either structured (business intelligence) or unstructured (search) to utilize cross data types search and analysis. See my blog on Master Data in Big Data Management for more on some ideas for linking together structured and unstructured data and automation techniques.

Advertisement


People who Tweet about Data Management

April 30, 2012

Data Management & Architecture

Karen Lopez @datachick

Neil Raden @NeilRaden

Robin Bloor @robinbloor

M. David Allen @mdavidallen

Sue Geuens @suegeuens

Mehmet Orun @DataMinstrel

Alec Sharp @alecsharp

Loretta Mahon Smith @silverdata

Eva Smith @datadeva

Corine Jasonius @DataGenie

Peter Aiken @paiken

Tony Shaw @tonyshaw

Glenn Thomas @Warduke

Bonnie O’Neil @bonnieoneil

Rob Paller @RobPaller

Pete Rivett @rivettp

Charles T. Betz @CharlesTBetz

Tracie Larsen @RelatedStuff

Wayne Eckerson @weckerson

Julian Keith Loren @jkloren

Christophe @mydatanews

Steve Francia @spf13

Gorm Braavig @gormb

Jim Finwick @jimfinwick

Alexej Freund @alexej_freund

Corinna Martinez @Futureatti

Data Quality

Jim Harris @ocdqblog – blog

David Loshin @davidloshin – blog

Rich Murnane @murnane

Daragh O Brien @daraghobrien

Jacqueline Roberts @JackieMRoberts

Steve Tuck @SteveTuck

Vish Agashe @VishAgashe

Julian Schwarzenbach @jschwa1

Henrik L. Sorensen @hlsdk

MDM and Data Governance

Jill Dyche @jilldyche – blog

Charles Blyth @charlesblyth

Steve Sarsfield @stevesarsfield – blog

Dan Power @dan_power

Philip Tyler @tylep0

Business Intelligence and Analytics

Marcus Borba @marcusborba

Tamara Dull @tamaradull

Claudia Imhoff @Claudia_Imhoff – blog

Scott Wallask @BI_expert

Peter Thomas @PeterJThomas – blog

Barney Finucane @bfinucane

Matt Winkleman @mattwinkleman

Stray_Cat @Stray_Cat

Brett2point0 @Brett2point0

Risk Management

Peter Went @Bank_Risk

Joshua Corman @joshcorman

Michael Rasmussen @GRCPundit

Nenshad Bardoliwalla @nenshad

Gary Byrne @GRCexpert

Helmut Schindlwick @Schindwick

Technology Companies and Data Organizations

Oracle @Oracle

DAMA international @DAMA_I

McKinsey on BT @mck_biztech

SmartData Collective @SmartDataCo

DataFlux InSight @Datafluxinsight

Gartner @Gartner_inc

TDWI @TDWI

Scientific  Computing @SciCom

Wearecloud @wearecloud

CloudCamp @cloudcamp

Panorama Software @PanoramaSW

Data Hole @datahole

BI Knowledge Base @biknowledgebase

EnterpriseArchitects @enterprisearchitects

DataQualityPro.com @dataqualitypro

RSA Archer eGRC @ArcherGRC

Exobox @Exobox_Security

EA_Consultant @EA_Consultant

Cloudbook @cloudbook

ID Experts @idexperts

IAIDQ @iaidq

EMC Forum @EMCForums

Data Junkies @datajunkies

True Finance Data @truefinancedata

Madam @TheMDMNetwork

IBM Initiate @IBMInitiate

Accelus_GRC @PaisleyGRC

DQ Asia Pacific @DQAsiaPacific

Data Guide @DataGuide

PCI PA-DSS Data @DataAssurant

DataFlux Corporation @DataFlux


What is different about Big Data Governance?

December 21, 2011

In most ways, Data Governance of Big Data is not different from normal Data Governance.  The benefits are the same.  The reasons for doing it are the same.  And, mostly, what needs to be done is the same.

What is different about Big Data Governance is that it’s about more data types, more sophisticated tools are needed, and the need for more metadata is critical.

First of all, Big Data Governance requires performing Governance over many different types of data, not just what’s in relational databases.  Certainly, the scope needs to include non-relational databases and unstructured data and documents.  This itself may require new tools to deal with these other technologies.

Secondly (and maybe this should be first because it is about data volumes), more sophisticated tools are needed to assess and profile data.  Big Data volumes are beyond human manageable scale and the traditional approaches of profiling and managing data primarily through observation becomes unfeasible.

Thirdly, the importance of collecting and documenting metadata becomes critical in order to automate as much as possible of the Data Governance activities.  This item is tied with the one above, in that more sophisticated tools can help to infer the metadata of the relatonships between the data, and metadata is required to automate the monitoring activities.

In summary, the strategic reasons for doing Data Governance remain the same and the way a Big Data organization is structured, but how the Data Governance of Big Data is actually performed may be very different.


Data Governance Certification or Data Stewardship Certification?

December 6, 2011

The Data Management Association (DAMA) is now offering a Data Governance Certification as an option of their current Certified Data Management Professional, which is a natural extension since the test for Data Governance already existed under their current certification process and merely requires a specific configuration of test modules. But what does Data Governance certification mean and is that really what is needed? The Data Governance certification offered by DAMA is, to a great extent, based on the Data Governance practice area described in the DAMA Data Management Body of Knowledge document (DMBOK) which was published in 2009. That focuses on the best practices for a Data Governance program and organization in terms of what activities it should be performing, what tools it should be using, and what roles and responsibilities should be present. But do we need to be certifying that people know how to set up a Data Governance program? Rather, should we be focusing on what the people who need to perform Data Governance for an organization should be doing – the Data Stewards? Certifying Data Stewards may not be something that should be done generically. Rather, an organization may want to certify that the identified Data Stewards within their organization are knowledgeable in the agreed standard operating procedures for Data Stewards in that particular organization. In summary, having a Data Governance certification makes sense that identifies individuals who are familiar with how, in general, a Data Governance organization should be created and operated. It makes more sense for an organization to certify their Data Stewards on the particular processes unique to their organization.


If the Data Quality got better but no one measured …

November 2, 2011

There is an old philosophical question: “If a tree fell in the forest but no one heard it, did it make a noise?”  The basis of the question being that every time we’ve seen a tree fall in the past it has made a noise, but if no one heard it fall then maybe this one time it didn’t … but you couldn’t prove it either way.

Centrally important to certain areas of Data Management such as Data Governance, Master Data Management, and especially Data Quality is the absolute importance of metrics and measures.  You can’t demonstrate that the quality of data improved unless you measure it.  You can’t report the benefit of your program unless you measure it.  And, showing improvement means that you need to measure both before and after to calculate the improvement.

Senior executives in organizations want to know what value a technology investment brings them.  And the ways to show value are increased revenue, lowered cost,  and reduced risk (which can include regulatory compliance). Without reporting financial benefit to management few organizations are willing to support ongoing improvement projects for multiple years.  Also, it is important to report both what the financial benefit has been and what additional opportunities remain – management is very happy to  declare success and terminate the program unless you are also reporting what remains to be done.


When Technology Leads – The Tail Wagging the Dog

September 27, 2011

There is a great temptation to implement technology because it is “cool”, but for decades business and technology strategists (as well as most people in both business and technology) have realized that unless your business is to sell technology, the implementation of technology should be in support of business goals.  Sometimes, technology innovations can provide entirely new ways of performing business services and allow business differentiation.  In fact, there is a movement toward technology strategy being developed in collaboration with business strategy, rather than subsequently.

There are also some business functions that must be performed by every organization that are critical to business operation where, in practice, the technology organization tends to lead. One such area is “Business Continuity”, preparing for emergencies and business disruptions.  This is a business responsibility which cannot be simply delegated to the technology organization, and yet it requires significant specialized expertise, and in practice tends to be developed mostly by highly trained technologists.  The part of Business Continuity that deals with the recovery of data and computer systems is called “Disaster Recovery” and is a core technology operations capability.  So, the technology organizations tend to provide most of the resources to help business areas develop, test, and implement Business Continuity plans.  In practice, the tail wags the dog.

Best practice holds that Data Governance and Data Quality programs should be led by business managers, not IT, but there are key aspects of these programs which cannot be accomplished readily without technology support.  The key skills involved in performing these functions involve process improvement and data analysis capabilities, which are skills found most frequently in technology organizations.  Frequently, Data Governance and Data Quality initiatives get started in IT, but tend to be much more successful when led from business areas.


Driving Unstructured Data Management

May 25, 2011

In preparing my presentation for the Data Governance Conference on Unstructured Data Management, I am thinking how a great deal of the focus seems to be on tagging unstructured data (email, documents) with expiration dates to help manage the huge and geometrically growing volumes of unstructured data.  Attention seems to be so much more on managing the end of the life cycle of unstructured data than it does with structured data.

In my experience, too, the goal of the legal department seems to be in eliminating as much data in the organization as possible.  This appears counter intuitive – since most people’s interaction with the legal department is when they tell us not to delete anything.  But, legal departments would prefer for the organization to have policies that remove old documents and email altogether so that when asked by a court to produce documents they can say that company policy is to get rid of documents of that age (with specific exceptions including what an organization is legally required to retain) and not have to embark on an expensive search project.  If the documents requested are within the company policy for retention, then Legal wants them organized for efficient search and retrieval.


Data Quality Reports in a Data Governance Program

May 4, 2011

There are two types of Data Quality reports that are regularly produced for Data Governance: data out of compliance with business rules and statistics on data out of compliance with business rules including if the data has gotten better or worse from previous.

I suppose there is also the report of metrics on the Data Governance program as well.