What we don’t currently emphasize greatly in Data Management is the end of the data life cycle, when data is archived and, possibly later, deleted. This has been in the past because we’ve wanted to have available as much data as our technical solutions could store and if we couldn’t store all of it then usually the oldest data would just be deleted after backup. Now, in the era of Big Data there is exponential growth in the amount of data being produced and the ability to move data aside (archive it) and retrieve it back is well developed.
Ultimately, all data in an organization (structured and unstructured) needs to be tagged with Metadata that indicates when the data is to be archived, when deleted, and who is responsible for the data, including ultimately approving that data should be deleted.
Although Data Archiving itself is an interesting topic, it isn’t particularly sophisticated in regards to its data movement aspects since the data being archived is usually not transformed. Data Archiving is usually a topic under Data Backup and Recovery. Archiving data assumes that the data is moved to a less costly (and possibly less accessible) platform from which it can be retrieved or accessed in the future, either brought back to the original application for access or possibly accessed from the archive environment.
Regulatory requirements in Pharmaceutical firms require that data from an application being shut down must be archived along with the application and hardware on which it runs. This is a very smart approach to recoverability since a simple data backup is not recoverable if the hardware that it ran on is no longer available. Less strict regulation may simply require that the data itself be archived. Standard data archiving capabilities associated with data backup will move the data to offline storage, but bringing the data back into the operational application can be problematic if the structure or schema of the data in the application changes after the data is archived.
In cases where the archived data need not be brought back into the operational application environment but only needs to be accessible, it may be decided to transform the data being archived into a common technical format: It may be decided to have the data in the data archive environment all be stored in the same database management system, for example.
It is critical that data being archived be done so along with its associated metadata explaining the meaning of the data and its history. If archiving data that may have to be accessed independently of the application from which it came, it may be best to use a data structure or storage solution that can both keep the data with its metadata and allow flexible and changing data structures. An XML type of data structure would allow a changing data structure, associated metadata, and still allow queries across data that had been archived from the same application, even if the source data structures had changed.