Currently, most data management activities are segregated by data type: documents are kept in one type of file repository, emails in another, structured data in databases, etc. One of the goals and values of big data management is being able to analyze data across these repositories, but if so then how do we link the data together? A big part of the answer, I believe, is master data. Master data is the data describing the important things in the organization: customers, products, employees, organizational structure, financial reporting structure, etc.
People with appropriate access in the organization should be able to view about a customer, for example, not only the customer’s name, addresses, and other demographic information, but emails to and from and concerning the customer, documents related to them, as well as audio recordings of any calls to customer service and video of the customer visiting the organization’s offices. All the organizational information about a customer can be made appropriately available to customer service to support a customer inquiry, to identify additional products with which the customer might be interested, or to predict likely future behavior.
Standard business intelligence tools can be used to find and connect information about a customer located in databases. Tools that search text can be used to find information related to a customer in document and email repositories either because these items contain text with the customer’s identifying information or because someone has tagged the documentation with the customer’s id. Similarly, audio and video files can be searched for the customer likeness or tagged manually with customer information. Links to a customer can be made at the time the information is stored or dynamically when a query is made about the customer. Tagging files and documents with customer identifiers can be performed automatically or manually. The ability to attach the customer information automatically is critical to big data management since the volume of data is usually beyond human manageable scale and we need to move away from the concept of manually crafted metadata.
And so, if the data in databases is called “structured” with keys associated with the master data in the organization, then we can integrate that data together with the “unstructured” data in files, documents, and emails by tagging the unstructured data with the key master data information, automatically and manually, at storage and at query time.