thoughtsondata

My thoughts on Data Management
  • Home
  • About April Reeve
 

Master Data in Big Data Management

April 3, 2013

Currently, most data management activities are segregated by data type: documents are kept in one type of file repository, emails in another, structured data in databases, etc. One of the goals and values of big data management is being able to analyze data across these repositories, but if so then how do we link the data together? A big part of the answer, I believe, is master data. Master data is the data describing the important things in the organization: customers, products, employees, organizational structure, financial reporting structure, etc.

People with appropriate access in the organization should be able to view about a customer, for example, not only the customer’s name, addresses, and other demographic information, but emails to and from and concerning the customer, documents related to them, as well as audio recordings of any calls to customer service and video of the customer visiting the organization’s offices. All the organizational information about a customer can be made appropriately available to customer service to support a customer inquiry, to identify additional products with which the customer might be interested, or to predict likely future behavior.

Standard business intelligence tools can be used to find and connect information about a customer located in databases. Tools that search text can be used to find information related to a customer in document and email repositories either because these items contain text with the customer’s identifying information or because someone has tagged the documentation with the customer’s id. Similarly, audio and video files can be searched for the customer likeness or tagged manually with customer information. Links to a customer can be made at the time the information is stored or dynamically when a query is made about the customer. Tagging files and documents with customer identifiers can be performed automatically or manually. The ability to attach the customer information automatically is critical to big data management since the volume of data is usually beyond human manageable scale and we need to move away from the concept of manually crafted metadata.

And so, if the data in databases is called “structured” with keys associated with the master data in the organization, then we can integrate that data together with the “unstructured” data in files, documents, and emails by tagging the unstructured data with the key master data information, automatically and manually, at storage and at query time.

Advertisement

Leave a Comment » | Uncategorized | Tagged: Big Data, Data Governance, Data Management, Master Data Management, Metadata | Permalink
Posted by thoughtsondata


The End of the Data Life Cycle – Data Archiving and Removal

February 2, 2012

What we don’t currently emphasize greatly in Data Management is the end of the data life cycle, when data is archived and, possibly later, deleted.  This has been in the past because we’ve wanted to have available as much data as our technical solutions could store and if we couldn’t store all of it then usually the oldest data would just be deleted after backup.  Now, in the era of Big Data there is exponential growth in the amount of data being produced and the ability to move data aside (archive it) and retrieve it back is well developed.

Ultimately, all data in an organization (structured and unstructured) needs to be tagged with Metadata that indicates when the data is to be archived, when deleted, and who is responsible for the data, including ultimately approving that data should be deleted.

Although Data Archiving itself is an interesting topic, it isn’t particularly sophisticated in regards to its data movement aspects since the data being archived is usually not transformed.  Data Archiving is usually a topic under Data Backup and Recovery. Archiving data assumes that the data is moved to a less costly (and possibly less accessible) platform from which it can be retrieved or accessed in the future, either brought back to the original application for access or possibly accessed from the archive environment.

Regulatory requirements in Pharmaceutical firms require that data from an application being shut down must be archived along with the application and hardware on which it runs.  This is a very smart approach to recoverability since a simple data backup is not recoverable if the hardware that it ran on is no longer available.  Less strict regulation may simply require that the data itself be archived.  Standard data archiving capabilities associated with data backup will move the data to offline storage, but bringing the data back into the operational application can be problematic if the structure or schema of the data in the application changes after the data is archived.

In cases where the archived data need not be brought back into the operational application environment but only needs to be accessible, it may be decided to transform the data being archived into a common technical format:  It may be decided to have the data in the data archive environment all be stored in the same database management system, for example.

It is critical that data being archived be done so along with its associated metadata explaining the meaning of the data and its history.  If archiving data that may have to be accessed independently of the application from which it came, it may be best to use a data structure or storage solution that can both keep the data with its metadata and allow flexible and changing data structures.  An XML type of data structure would allow a changing data structure, associated metadata, and still allow queries across data that had been archived from the same application, even if the source data structures had changed.

Leave a Comment » | Uncategorized | Tagged: Data Archiving, Data Management, Metadata | Permalink
Posted by thoughtsondata


  • Enter your email address to follow this blog and receive notifications of new posts by email.

    Join 9 other subscribers
  • My book is now available!

    Hot off the press "Managing Data in Motion - Data Integration Best Practice Techniques and Technologies" written by me and published by Morgan Kaufmann

  • Recent and Upcoming Speaking Engagements

    “Managing Data in Motion – A Review of Real Time and Big Data Integration Approaches and Technologies,” DAMA New England, Monday evening April 22, 2013 Boston, MA and Tuesday afternoon April 23, 2013 Hartford, CT

    “Herding Interfaces – Data Integration Best Practices and New Technologies,” Enterprise Data World, San Diego, CA, April 29, 2013

    “Data Governance of New Data Types – Big Data Governance,” Data Governance Conference, San Diego, CA, June 19, 2013

    “Managing Data in Motion – A Review of Real Time and Big Data Integration Approaches and Technologies,” DAMA Philadelphia, September 11, 2013, Philadelphia, PA

  • Recent Posts

    • Big Data Governance – Part 1 – Why Do You Govern Data Outside of Databases?
    • Drivers for Managing Data Integration – from Data Conversion to Big Data
    • Is High Availability Sexy?
    • Master Data in Big Data Management
    • Don’t Get Caught in the Statistical Cobwebs of Data Quality
  • What’s Most Frequently Discussed?

    Big Data Business Continuity Business Intelligence Capability Assessment Data Archiving Data Conversion Data Governance Data Integration Data Management Data Modeling Data Quality Data Security Data Stewardship Data Virtualization Data Warehousing Document Management Enterprise Content Management Master Data Management Metadata Project Management Technology Strategy Uncategorized
  • Twitter Updates

    • Talking about Data Literacy with Peter Aiken @paiken #dgiq and Karen Lopez @datachick at the Data Governance and In… twitter.com/i/web/status/1… 1 month ago
    • RT @Dataversity: Welcome everyone to the Data Governance & Information Quality conference! We're so excited to see you! Look for updates th… 1 month ago
    • RT @StiboSystems: Sun & Ski Sports got it right. In this webinar, SVP of Omnichannel and Marketing Jennifer Skeen shares her secret to #Omn… 3 months ago
    • At #StiboSystemsConnect Michelle Goetz ⁦@Mgoetz_FORR⁩ speaking on the connected intelligence framework https://t.co/AquU3XvMgJ 3 months ago
    • My favorite session so far at #StiboSystemsConnect was Shannon Malhave on UI Experience. Info packed! Go see her at… twitter.com/i/web/status/1… 3 months ago
    Follow @Datagrrl


Blog at WordPress.com.

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
  • Follow Following
    • thoughtsondata
    • Already have a WordPress.com account? Log in now.
    • thoughtsondata
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar