Managing Massive Data Growth


A Combination of Data Efficiency Technologies Provides Ways to Optimize Primary Storage Capacity and Performance

It’s no secret that growth in data is expected to remain rampant for many years to come.  According to the InformationWeek “State of Enterprise Storage 2014” survey, IT is dealing with 25% or more yearly growth at nearly one-third of all companies.  Furthermore, budgets are strained, with 1 in 4 stating they lack the funds to simply meet demands.

As a result, IT directors around the globe are struggling with decisions concerning the handling of both primary and secondary data storage.  Ideally, they need the ability to store and manage data that consumes the least amount of space with little to no impact on performance.  With real-world budgets in play, optimizing performance via high-priced flash-based solutions will continue to be a fantasy for most.  As a result, reducing storage needs can be an integral part of the equation for most organizations.

Data reduction technologies like deduplication, compression, and thin provisioning can reduce data sets by 25-90% and are designed to offset growth by storing more data per storage device.  Provided that IT administrators consider data type and the functionality of each technology, these technologies can provide considerable benefits.

 Data deduplication works by replacing duplicate data across many files with references to a shared single copy. The percentage of organizations using deduplication increased from 38% in 2011 to 55% in 2014.   On average, more than half of the total volume of a company’s data is in the form of redundant copies. Deduplication technologies can reduce the quantity of data stored at many organizations by more than 25x on some data types. Storing less data requires fewer hardware resources, which in turn consumes less energy.

However, not every data set or environment is suitable for deduplication. When used for data sets with large amounts of static data, it can yield significant storage savings. If used for the wrong type of data, performance issues will arise. It is necessary for IT administrators to understand how specific data sets will respond to data deduplication and use it only where the benefits exceed the costs. Deduplication is particularly effective with unstructured data sets (like home directories and department shares), virtual machines and application services, virtual desktops, or test and development environments.

Data compression is a process in which algorithms are used to encode a single block of data to reduce its total physical size, thus providing a storage savings.  As with deduplication, data compression has been well integrated into backup systems for many years.  Now those benefits are available for primary storage data systems. In fact, a recent survey revealed that roughly 33% of IT administrators are benefitting from data compression on the primary side. Space savings from primary storage compression have been estimated at 15 to 30%.

As with data deduplication, compressing data has potential performance pitfalls, and IT administrators need to understand how to best utilize it for maximum efficiency.   Benefits of compression are most often associated with relational databases, including online transaction processing (OLTP), decision support systems (DSS), and data warehouses. Savings diminish with unstructured and encrypted data sets. A key factor for success is the number of compression algorithms provided by the storage platform.

The final strategy, thin provisioning, is not technically a data reduction technology, but does provide an efficient, on-demand storage consumption model. In the past, servers were allocated storage based on anticipated requirements. In order to avoid performance issues if these limits were exceeded, over provisioning of storage normally resulted. Thin provisioning allocates storage on a just-enough, just-in-time basis by centrally controlling capacity and allocating space only as applications require the space. Thus you can allocate space for an application with data storage needs that you expect to grow in the future, but power only storage that is currently in use.  A recent survey revealed that 39% of IT administrators use thin provisioning today, up from 28% in 2012.

In the end, the ultimate goal of data efficiency is to remain transparent to the user while providing tangible benefits like managing growth and reducing overall storage costs. When implemented simultaneously, these three technologies produce peak results. If used appropriately, they will enable organizations to repurpose data center resources and add decades of new life to resource-constrained data centers.