"Storage is cheap!" you might say. Well, certainly disk storage has become
cheaper over time, and there are all sorts of products on the market taking
advantage of the price reduction with virtual tape libraries and megastorage
appliances. But keep in mind the fact that more disk storage comes at a cost
beyond the vendor price tag: power to support and cool, space to house, IT
administrators to oversee. You have to factor in all these costs. Products that
provide single-instance storage (SIS) and data deduplication can really help
mitigate the expense of so-called cheap storage.
You may not realize that you have some of the necessary ingredients to take
advantage of deduplication within your Microsoft server environment. But you do.
[ Get the latest on storage developments with InfoWorld's Technology: Storage
newsletter. | Learn which vendors support Windows Server 2008's deduplication
capabilities. ]
First, let's be clear on what deduplication is: Data deduplication is the
process of eliminating data redundancies at the storage repository or from
network traffic. You can deduplicate either at the object (file) level, which is
also called "single instancing," or at the block (subfile) level, which saves
much more space.
Data is naturally duplicated due to mass distribution or data processing needs.
Most IT organizations maintain multiple copies of the same file in different
repositories, or even a few iterations of files you are working on. In addition,
backup applications produce and maintain multiple copies of files so that they
are available for recovery. Backup processes have contributed greatly to the
explosion of data proliferation in the datacenter.
Consider a simple scenario. An e-mail is sent out with a 10MB video to 100
people. If the e-mail platform doesn't have SIS capabilities and the backup
product doesn't have a deduplication feature, you are looking at backing up 1GB
of data (which takes space, time, and money) as opposed to a single instance of
10MB.