Data Virtualization – Part 2 – Data Caching

Another type of Data Virtualization that is less frequently discussed than Business Intelligence (see my previous blog) has to do with having data available in the computer’s memory, or as close to it as possible, in order to tremendously speed up the speed of both data access and update.

A simplistic way of thinking about the relative time to retrieve data is that if it takes a certain amount of time in nanoseconds to retrieve something in memory then it will be something like 1000 times that to retrieve data from disk (milliseconds). Depending on the infrastructure configuration, retrieving data over a LAN or from the internet may be ten to 1000 times slower than that. If I load my most heavily used data into memory in advance, or something that behaves like memory, then my processing of that data should be speeded up by multiple orders of magnitude.  Using solid state disk for heavily used data can achieve access and update response times similar to having data in memory.  Computer memory, as well as solid state drives, although not as cheap as traditional disk, are certainly substantially less expensive than they used to be.

Using memory caching of data can be done using traditional databases and sequential processing, and multiple orders of magnitude response time improvements can be achieved.  However, really spectacular performance is possible if we combine memory caching with parallel computing and databases designed around data caching, such as GemFire.  This does require that we develop systems using these new technologies and approaches in order to take advantage of parallel processing and optimized data caching, but the results can be blazingly fast.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: