by Bruce Kornfeld, Vice President of Marketing — October 14, 2009
Earlier today, SearchStorage.com, one of the most influential storage technology media outlets in the world, published an article asking, “Is data deduplication right for your primary storage infrastructure?” We at Compellent think this is an interesting and important question as some end-users have come to think that deduplication will solve all (or at least most) of their capacity issues, and the industry has responded in kind. Just look at EMC’s recent acquisition of Data Domain.
The issues the article raises about deduplication are important ones that should be considered before implementing the technology. For instance, dedupe in backup storage can be highly efficient (because backups are 90 percent repetitive from day to day), but the efficiency of dedupe drops when it’s applied to primary storage. Furthermore, as the article points out, dedupe in primary storage can reduce the performance of the entire system, especially in applications where data is constantly changing, like transactional databases. If data from live, online applications is deduped, it would first need to be “un-deduplicated” or “reconstituted” before being read, which seems like a lot of extra time and computing effort required to obtain what amounts to be very little storage space savings. In our discussions with end-users, it’s hard to find organizations actually using deduplication for active, primary applications.
In reality, a comprehensive strategy for reducing redundant data doesn’t have to center on deduplication technology. For instance, space efficient snapshot technology like our Data Instant Replay takes continuous snapshots of changed data at regular intervals (as frequently as every 15 minutes) and stores them on the SAN. Since they only capture changed blocks of data, these replays are relatively small, enabling customers to store a virtually unlimited number while requiring minimal storage capacity and having no impact on system performance. In the event of a data loss, admins can easily rollback to a snapshot and recover from that point in time in just a few mouse clicks. Many customers also use replays to test new applications without putting entire servers or storage systems at risk. If the SAN doesn’t require space pre-allocation for replays – which is another benefit of our Dynamic Block Architecture – organizations utilizing these technologies can be much more efficient.
Another way to achieve significant storage efficiencies is to deploy automated tiered storage at the block level, available in our Data Progression software. Data Progression automatically moves inactive blocks of data within active volumes from high-performance, high-cost tiers of storage (15K Fibre Channel drives or SSDs) down to lower-performance, lower-cost tiers (SATA drives). Data Progression also migrates these inactive blocks from RAID10 to RAID5 along the way. The overall result is fewer disk drives, more high-performance disks available for active data and reduced growth costs, since add-on storage is typically achieved with lower-cost SATA drives.
So with all of buzz around dedupe permeating the wishlists of data center admins everywhere, it’s important to remember that dedupe isn’t the only answer for reducing data storage needs and increasing storage efficiency in the data center. In case you’re wondering, if customers want to use dedupe for their backups with our SAN, we partner with CommVault for source-based dedupe and work with Exagrid for target-based dedupe.