Today primary storage is typically the mainstay of data center storage. For the most part there is no real tiering of data. Data is created, modified and stored on primary storage, and seldom deleted or archived. When more space is needed, more capacity is bought. Modern storage systems have made adding drives an increasingly simple task. All of this additional storage and the data that soon finds its home there needs to be protected, managed, powered and of course, paid for.

The first challenge is that while capacity has continued to progress, the technologies that underlie primary storage, fibre channel or SAS mechanical hard drives, have seen relatively small performance increases over the last few years. The typical answer for applications that demand high storage I/O performance has been to create storage systems with very high drive counts. The more drives acting together the better the I/O performance becomes, or so the rationale goes. The problem is this further pushes up the cost of primary storage and leads to additional wasted capacity, as all these spindles that you need for performance are only partially filled with data.

Enter Solid State Disk (SSD). In the past SSD was relegated to being used for very high-performance storage I/O problems. The cost of SSD has now come into a range that more data centers can afford to implement it and it may make sense for more than just these highest I/O applications. Imagine the benefits if more data could be served from fast SSD technology. Typically customer facing applications will benefit from improved responsiveness and lag times will be reduced dramatically compared to other technologies. For those applications, performance problems would be eliminated. And for those data sets, time spent configuring and reconfiguring array setups for optimal performance would be eliminated. Backup windows for those data, which are impacted as much by the time it takes to walk the file system and read the backup data as by the time to actually write the backup data, would be greatly reduced. Another benefit is power cost savings, as the ‘per IOP’ power consumption of SSD is significantly less, compared to mechanical storage. Overall, to the extent that more SSD can be added to the storage mix, the results are positive.

The challenge however, is that as inexpensive as SSD has become, it’s still too expensive for all active data. Ideally, the most active data should be on SSD, with a definition of “active” as data that’s currently being accessed or has been accessed within the last few days or weeks. Using the 80/20 rule, as it’s often applied to data, 80% of data is typically not accessed within the last year, only 20% has been. When looking at the last week or last few days, the percentage of data being accessed drops to 5% or less. This means that even a 100TB data center should only need about 500GBs of SSD, and the remaining 95% of data could be migrated to a cost-effective secondary disk tier.

SSD can often fix the cost problems of primary storage associated with performance.  Now, secondary disk storage tiers like those from Permabit can also fix the cost of capacity, retention and data protection. The goal going forward is to move data from active storage, now on SSD, to secondary storage the moment it becomes inactive.

The first hurdle in this aggressive migration strategy, of course, is that the sooner that data is moved from its primary location the greater the chances that it will need to be recalled from secondary storage. That means that the secondary storage tier must be able to respond quickly to access requests and serve the data back to the user nearly as fast as if it were on primary storage. In other words, it too must be disk-based.

The secondary disk tier must balance the needs of data safety yet still deliver on its primary mission of dramatically reducing storage costs. To do this it must be able to safely and reliably use SATA-based hard drive technology. This technology provides the performance needed to service frequent recalls of semi-active data while at the same time being significantly more cost-effective per GB than traditional primary storage platforms. Along with merely using economical SATA drives it should also leverage data optimization technologies like deduplication and compression. The goal should be to create a tier that can break the $1 per GB effective price barrier while still maintaining reasonable access performance.

The second hurdle is to make sure that this secondary tier, now playing a much more active role in day-to-day operations, is protecting the data it houses and keeping it available. A secondary disk tier needs to be able to recover from hard drive failures faster and more reliably than traditional primary storage. This is because the secondary tier, since it leverages SATA-based, high capacity drives that have a slightly higher failure rate and certainly a longer rebuild time on RAID recoveries, must deploy something other than RAID to provide data protection. For example, Permabit uses RAIN-EC to provide protection that is substantially more reliable than RAID 6 with larger drive technology found today and is significantly faster in the rebuild process.

The final hurdle is one of scale. The secondary tier could also be considered the growth tier, since the primary storage tier, now SSD based, will grow much more slowly. The secondary tier is where growth is best suited because the storage costs are significantly lower. Scale for this tier should be defined as the capability to add hundreds if not thousands of TBs of capacity to a single storage system. Having to spread capacity across multiple systems to reach scale would significantly add to the management and overhead costs associated with it. Instead, this tier should be a single system that can scale out as capacity demands dictate. Ideally, it should be a clustered or grid storage solution where nodes of different sizes can be added and managed as a single entity.

Storage and storage managers are at a crossroads. They can continue to conduct business as usual, adding more and more capacity to a primary storage tier, hoping that someday it will be cost effective, adequately protected and manageable. Or, they can act by moving to a strategy that leverages new technology to meet all these challenges. By aggressively implementing a more functional secondary storage platform and augmenting and eventually replacing their primary storage with SSD, storage managers can reduce costs and improve data availability while increasing system performance.

George Crump, Senior Analyst

This Article Sponsored by Permabit

- The Death of Fibre Channel Hard Drives