Global Deduplication Can Offset Enterprise Storage Growth Rates

| | Comments (0)

It's almost impossible to pick up a trade rag or read an article on the web on data protection without some mention of deduplication. Some of that is deliberate as editors know that users are googling for the word "deduplication" and by putting DEDUPLICATION in big bold letters in the title or text of the article helps draw readers to their site. Yet these discussions, while relevant, overlook mid to long term data management requirements.

In fact, there was a great debate last fall at SNW between some of the deduplication vendors as to which way was the best way to deduplicate data. While entertaining, this debate solved little. Most users only care about deduplication insomuch as it affects their ability to successfully backup the data in their environment. However most of the focus in these discussions focus on solving their short-term requirements and don't take into consideration some of the longer term problems. Unfortunately hidden long term data management costs lurk for those who unwittingly adopt the wrong deduplication appliances.

Deduplicating appliances have gained mindshare with users because it makes disk as cheap, or cheaper, than tape by delivering data reduction ratios of 15:1 or more while expediting backups which solves their short term backup problems. However companies also need to consider, when selecting a deduplication product, how well it will best serve them in the long term.

For example, does the deduplication appliance extend the benefits of deduplication beyond the current appliance and, if it does, how does it do it? The capability to globally deduplicate data is very powerful, but most deduplicating storage appliances are limited in scope to just that one appliance. If the ceiling for performance or storage capacity is reached, one must bring in a new appliance. Thus, deduplication starts from scratch, even if the data on the initial appliance(s) have been deduplicated. The impact is silos of deduplicated data.

Is there an adverse impact of deduplicating silos of data annually? That really depends, but companies I talk to are experiencing year-over-year data growth rates of 50% or more. Recently, I spoke to a colleague at my previous employer and he said the storage rate there has continued to double yearly. While we believe data growth rates will slow down, anecdotal evidence suggests that it is not.

I bring this example with my previous employer up because as disk prices continue to drop, companies are bringing in more storage to store more data, often to make copies of existing data for other purposes such as testing, development, data mining and eDiscovery. But as companies back this repurposed data up to new appliances, they can not take advantage of the deduplication benefits that existing appliances provide.

Siloed deduplication offered some benefits during initial use, but forward looking companies realize global deduplication fully capitalizes on deduplication technology to provide the lowest TCO solution that meets business data requirements. Few products are available to support global deduplication, but new products like the NEC HYDRAstor can deliver this based on its storage architecture. Since HYDRAstor is based on a grid storage architecture, it can scale performance and capacity independently allowing IT to deploy one system instead of many. This architecture enables it to globally deduplicate data without running up against the performance and capacity limitations that current appliance based products encounter and force IT to deploy instances, which can easily create a management nightmare. Global deduplication also lowers total storage costs since the amount of data stored is minimized.

Using deduplication is cost-effective justification used to introduce disk into the backup process. However companies must recognize that in order to fully leverage the benefits of deduplication and keep it from becoming just another management pain point a few years down the road, they need to look beyond just the problems deduplication solves and what new problems it creates when you have many silos of deduplicated data. As companies consider their mid-to-long term business issues, emerging features like global deduplication and grid storage architectures merit special consideration in the data deduplication management process.

Leave a comment

Entry Sponsorship

This entry is sponsored by NEC HYDRAstor

About NEC HYDRAstor Blog

    HYDRAstor is a grid storage platform that addresses today's storage challenges through its "community of smart nodes." Comprised of self-aware, self-healing industry-standard servers with no single point of failure and no central resource bottleneck, HYDRAstor greatly enhances the flexibility of the storage environment while reducing infrastructure complexity and management overhead.