Data management in an organization is focused on getting data to its data consumers (whether human or application). Whereas the goal of data quality and data governance is trusted data, the goal of data integration is available data – getting data to the data consumers in the format that is right for them.
My new book on Data Integration has been published and is now available: “Managing Data in Motion: Data Integration Best Practice Techniques and Technologies”. Of course the first part of a book on data management techniques has to answer the question of why an organization should invest time and effort and money. The drivers for data integration solutions are very compelling.
Supporting Data Conversion
One very common need for data integration techniques is when copying or moving data from one application or data store to another, either when replacing an application in a portfolio or seeding the data needed for an additional application implementation. It is necessary to format the data as appropriate for the new application data store, both the technical format and the semantic business meaning of the data.
Managing the Complexity of Data Interfaces by Creating Data Hubs – MDM, Data Warehouses & Marts, Hub & Spoke
This, I think, is the most compelling reason for an organization to have an enterprise data integration strategy and architecture: hubs of data significantly simplify the problem of managing the data flowing between the applications in an organization. The number of potential interfaces between applications in an organization is an exponential function of the number of applications. Thus, an organization with one thousand applications could have as many as half a million interfaces, if all applications had to talk to all others. By using hubs of data, an organization brings the potential number of interfaces down to be just a linear function of the number of applications.
Master Data Management hubs are created to provide a central place for all applications in an organization to get its Master Data. Similarly, Data Warehouses and Data Marts enable an organization to have one place to obtain all the data they need for reporting and analysis.
Data hubs that are not visible to the human data consumers of the organization can be used to significantly simplify the natural complexity of data interfaces. If data being passed around in the organization is formatted, on leaving the application where it has been updated, into a common data format for that type of data, then applications updating data only need to reformat data into one format, instead of a different format for every application that needs it. Applications that need to receive the data that has been updated only need to reformat the data from the one common format into their own needs. This approach to data integration architecture is called using a “hub and spoke” approach. The structure of the common data format that all applications pass their data to and from is called the “canonical model.” Applications that want a certain kind of data need to “subscribe” to that data and applications that provide a certain kind of data are said to “publish” the data.
Integrating Vendor Packages with an Organization’s Application Portfolio
Current best practice is to buy vendor packages rather than developing custom applications, whenever possible. This exacerbates the data integration problem because each of these vendor packages will have their own master data that have to be integrated with the organization’s master data and they will either have to send or receive transactional data for consolidated reporting and analytics.
Sharing Data Among Applications and Organizations
Some data just naturally needs to flow between applications to support the operational processes of the organization. These days, that flow of data usually needs to be in a real time or near real time mode, and it makes sense to solve the requirements across the enterprise or across the applications that support the supply chain of data rather than developing independent solutions for each application.
Archiving Data
The life cycle for data may not match the life cycle for the application in which it resides. Some data may get in the way if retained in the active operational application and some data may need to be retained after an application is retired, even if the data is not being migrated to another application. All enterprises should have an enterprise archiving solution available where data can be housed and from which it can still be retrieved, even if the application from which it was taken no longer exists.
Moving data out of an application data store and restructuring it for an enterprise archiving solution is an important data integration function.
Leveraging External Available Data
There is so much data now available from government and other sites external to a company’s own, for free as well as data available for a fee. In order to leverage the value of what is available the external data needs to be made available to the data consumers who can use it, in an appropriate format. The amount of data now available is so vast and so fast that it may not be warranted to store or persist the external data, rather using techniques with data virtualization and streaming data, or not to store the data within the organization, choosing instead to leverage cloud solutions that are also external.
Integrating Structured and Unstructured Data
New tools and techniques allow analysis of unstructured data such as documents, web sites, social media feeds, audio, and video data. Greatest meaning can be applied to the analysis when it is possible to integrate together structured data (found in databases) and unstructured data types listed above. Data integration techniques and new technologies such as data virtualization servers enable the integration of structured and unstructured data.
Support Operational Intelligence and Management Decision Support
Using data integration to leverage big data includes not just mashing different types of data together for analysis, but being able to use data streams with that big data analysis to trigger alerts and even automated actions. Example use cases exist in every industry but some of the ones we’re all aware of include monitoring for credit card fraud as well as recommending products.
Drivers for Managing Data Integration – from Data Conversion to Big Data
April 25, 2013People who Tweet about Data Management
April 30, 2012Data Management & Architecture
Karen Lopez @datachick
Neil Raden @NeilRaden
Robin Bloor @robinbloor
M. David Allen @mdavidallen
Sue Geuens @suegeuens
Mehmet Orun @DataMinstrel
Alec Sharp @alecsharp
Loretta Mahon Smith @silverdata
Eva Smith @datadeva
Corine Jasonius @DataGenie
Peter Aiken @paiken
Tony Shaw @tonyshaw
Glenn Thomas @Warduke
Bonnie O’Neil @bonnieoneil
Rob Paller @RobPaller
Pete Rivett @rivettp
Charles T. Betz @CharlesTBetz
Tracie Larsen @RelatedStuff
Wayne Eckerson @weckerson
Julian Keith Loren @jkloren
Christophe @mydatanews
Steve Francia @spf13
Gorm Braavig @gormb
Jim Finwick @jimfinwick
Alexej Freund @alexej_freund
Corinna Martinez @Futureatti
Data Quality
Jim Harris @ocdqblog – blog
David Loshin @davidloshin – blog
Rich Murnane @murnane
Daragh O Brien @daraghobrien
Jacqueline Roberts @JackieMRoberts
Steve Tuck @SteveTuck
Vish Agashe @VishAgashe
Julian Schwarzenbach @jschwa1
Henrik L. Sorensen @hlsdk
MDM and Data Governance
Jill Dyche @jilldyche – blog
Charles Blyth @charlesblyth
Steve Sarsfield @stevesarsfield – blog
Dan Power @dan_power
Philip Tyler @tylep0
Business Intelligence and Analytics
Marcus Borba @marcusborba
Tamara Dull @tamaradull
Claudia Imhoff @Claudia_Imhoff – blog
Scott Wallask @BI_expert
Peter Thomas @PeterJThomas – blog
Barney Finucane @bfinucane
Matt Winkleman @mattwinkleman
Stray_Cat @Stray_Cat
Brett2point0 @Brett2point0
Risk Management
Peter Went @Bank_Risk
Joshua Corman @joshcorman
Michael Rasmussen @GRCPundit
Nenshad Bardoliwalla @nenshad
Gary Byrne @GRCexpert
Helmut Schindlwick @Schindwick
Technology Companies and Data Organizations
Oracle @Oracle
DAMA international @DAMA_I
McKinsey on BT @mck_biztech
SmartData Collective @SmartDataCo
DataFlux InSight @Datafluxinsight
Gartner @Gartner_inc
TDWI @TDWI
Scientific Computing @SciCom
Wearecloud @wearecloud
CloudCamp @cloudcamp
Panorama Software @PanoramaSW
Data Hole @datahole
BI Knowledge Base @biknowledgebase
EnterpriseArchitects @enterprisearchitects
DataQualityPro.com @dataqualitypro
RSA Archer eGRC @ArcherGRC
Exobox @Exobox_Security
EA_Consultant @EA_Consultant
Cloudbook @cloudbook
ID Experts @idexperts
IAIDQ @iaidq
EMC Forum @EMCForums
Data Junkies @datajunkies
True Finance Data @truefinancedata
Madam @TheMDMNetwork
IBM Initiate @IBMInitiate
Accelus_GRC @PaisleyGRC
DQ Asia Pacific @DQAsiaPacific
Data Guide @DataGuide
PCI PA-DSS Data @DataAssurant
DataFlux Corporation @DataFlux
If the Data Quality got better but no one measured …
November 2, 2011There is an old philosophical question: “If a tree fell in the forest but no one heard it, did it make a noise?” The basis of the question being that every time we’ve seen a tree fall in the past it has made a noise, but if no one heard it fall then maybe this one time it didn’t … but you couldn’t prove it either way.
Centrally important to certain areas of Data Management such as Data Governance, Master Data Management, and especially Data Quality is the absolute importance of metrics and measures. You can’t demonstrate that the quality of data improved unless you measure it. You can’t report the benefit of your program unless you measure it. And, showing improvement means that you need to measure both before and after to calculate the improvement.
Senior executives in organizations want to know what value a technology investment brings them. And the ways to show value are increased revenue, lowered cost, and reduced risk (which can include regulatory compliance). Without reporting financial benefit to management few organizations are willing to support ongoing improvement projects for multiple years. Also, it is important to report both what the financial benefit has been and what additional opportunities remain – management is very happy to declare success and terminate the program unless you are also reporting what remains to be done.
Architecting MDM for Reporting versus Real-time Processing
June 16, 2011In recent discussions with Joseph Dossantos, he pointed out to me that the differences in architecting an MDM solution for Reporting, such as for a Data Warehouse, and for real-time transaction processing, go beyond the choice of batch versus real-time Data Integration. Obviously, although the use of a batch ETL solution may be appropriate for integrating the source and target systems with a Master Data hub, it is insufficient for update and access to Master Data being used in transaction processing. For real-time Data Integration it is better to use an Enterprise Service Bus (ESB) and / or Service Oriented Architecture (SOA).
However, there are other differences in the architectural solution for real-time MDM. The common functions of MDM, such as matching and deduplication, also need to be architected for real-time use. The response to information requests needs to be instantaneous. Master Data for Reporting flows from source to hub to target to report (see Inmon’s Corporate Information Factory) but for transaction processing, all capabilities must be able to happen in any order or simultaneously.