The big transformation that we’re all dealing with in technology today is virtualization. There are many aspects to virtualization: infrastructure, systems, organization, office, applications. When you search on the internet for “data virtualization,” most of the references are regarding business intelligence and data warehousing uses. In part 2 of this blog I will talk about data virtualization and transaction processing.
In the day, when I used to build data warehouses (1990s+), there was a reference to a concept of “federated data warehouses”, where data in the logical warehouse would be separated physically either with the same schemas in multiple instances or different types of data in different locations. The thought was that the data would be physically separate but brought together real time for reporting. We also used to call that “data warehouses that don’t work”. After all, the reason we created data warehouses in the first place was that we needed to instantiate the data consolidation in order to make the response time reasonable when trying to report on millions of records. No, really, the response time on these “federated data warehouse” systems used to be many minutes or more.
Now, however, the technologies involved have made huge leaps in capabilities. The vendors have put thousands and thousands of man hours into how to make real time integration and reporting work. There are many techniques involving specialized software and hardware (data appliances) that enable these capabilities, query optimization, distributed processing, and other optimization techniques, and hybrid solutions between pure virtualization and instantiation. Specialized tuning is necessary, and the fastest solutions involve instantiating the consolidated data in a central place.
Ultimately, having to do a project to incorporate new data into the data warehouse physically isn’t responsive enough to the business need for information. Better to have a short term solution that allows for the quick incorporation of new data and then, if there is a continued need for the data in question and you want to speed up the response, possibly integrate the additional data into the physical data warehouse.
The problems being solved now for business intelligence and data virtualizations include real time data integration of multiple regional instances of a data warehouse, integrating data of different types and kinds, and integrating data from a data warehouse with big data and cloud data. This enables much more responsive business intelligence and analytical solutions to business requests without having to always instantiate all data for analysis into a central, single, enterprise data warehouse.