Thin Provisioning Basics
Thin Provisioning Basics
The above tools, compression, deduplication and archive, address the “20% problem.” This is data that is consuming storage but it is old and not being accessed anymore. Moving the older data to secondary or archive storage can best optimize this type of resource waster. Technologies like integrated data movement can migrate this data to secondary storage automatically, transparent to the user. Systems like those from 3PAR with their Dynamic Optimization for tiered environments, or their Nearline for Online solution which supports online applications in 100% SATA-based environments, can provide good cost optimization for the “20% problem.”
It is, however, the fact that 75% of capacity that is allocated but not used which is the biggest issue for cost-optimizing storage in the data center. It is free space that is purposely left available to account for growth of the application as it is deployed and used. There is nothing to compress, deduplicate or archive.
Allocated but unused storage is not a new problem, it has been around as a storage challenge since there was only direct attached storage and it was one of the driving factors to the birth of the storage area network (SAN). SANs were designed specifically to solve a slightly different problem – one of unallocated storage that was not easily available to the hosts that needed it. However, this did not address the problem of allocated-but-not-used capacity.
With most traditional SANs you have the ability to dynamically allocate capacity as the storage demand dictates. As more SANs began to add this capability the next challenge became the Operating System’s ability to recognize the storage that had been assigned to it had been expanded. For several years now most OS’s have supported dynamically expanding volumes, yet still the majority of capacity remains allocated but unused. The reason for this is simple: growing capacity continues to risk disruption for many applications. The challenge is that even though the tools and OS support exists, the process of expanding a volume is a multi-step manual process which risks application disruption and cannot keep pace with the rapidly changing data center. Operations staff are measured on uptime, and as a result the decision is made to over allocate storage to a specific server and tolerate the wasted capacity instead of constantly being interrupted expanding storage volumes.
To effectively address “the 75% problem,” new and disruptive technology was required – technology which would potentially threaten the revenue streams of the giant storage companies. Thin provisioning is designed to allocate physically capacity only as an application needs it – when it writes. It eliminates the complexity and risk of dynamically expanding storage and the amount of manual intervention required to make it work. The storage administrator continues to assign capacity based on projected requirements and the operating system running on the connect host actually believes it has that capacity assigned to it. The difference is that the capacity is not actually used until it is actually needed by being written to. This allows for over provisioning of the available storage and maximum utilization of the storage assets. Since the OS already thinks it has the storage assigned to it, then as the storage system actually uses the disk capacity there is no additional work to be done by the OS or administrator. It just works.
For example in a traditional storage system if you have an Oracle server that will need 3TB’s, an Exchange server that will need 2TB’s and a NAS head that will need 4TB’s, you would need to order a 9TB system plus the extra overhead associated with your RAID protection. In a thin provisioned system you will only need the capacity required at that moment. Sticking with the various survey’s 25% number, this means you would need 2.25TB’s. Factor in a little growth up front and you could comfortably get by with less than 4TB’s, a savings of 5TB’s. Extend this example to a real world data center and high end storage platform with dozens or hundreds of servers and the savings is likely to be very significant.
Thin provisioning provides the ability to purchase a new storage system that has significantly less overall capacity than your current system. There is the obvious upfront advantage in purchasing the latest technology for less than the current system. There is also the ongoing payback of powering significantly less storage. If in the above example the cost of powering that 5TB’s of storage can be delayed for even a year that can mean a significant power savings. Again moving the example into a larger data center situation could result in not powering 40 or 50TB’s, making for a huge power savings.
The primary concern with thin provisioning centers is over provisioning and what happens if the storage administrator actually runs out of storage. Clearly thin provisioning and the over provisioning of storage requires the storage administrator pay attention to the reports and warnings the systems provide. That reporting and alerting is plentiful and includes trending to project when the system will run out of storage space. As a result there are very few cases of users suddenly running out of disk space as a result of thin provisioning.
A sound analogy is the insurance model. If 1,000 people are insured, the fact is that some small percentage of these people will make claims every year. In a highly scalable storage array supporting 1000 volumes, even with some volumes “making claims” on the capacity they have been allocated, the array overall possesses ample buffer capacity to make good on the capacity growth desired while substantially reducing the total capacity that would have been otherwise required.
Even if built-in warnings and alerts are ignored, there are steps that can be taken. First older snapshot copies can be retired, if the solution has integrated data movement, older data can be moved to SATA drives with potentially more available capacity. Another important ingredient is to make sure there is an understanding of how long it will take for the storage vendor to ship additional capacity and how long the organizations procurement process is. In a worst case scenario, organizations have purchased a small amount of “stand-by” capacity that can be deployed quickly in the event capacity limits are reached sooner than expected.
There are some naysayers that claim thin provisioning should not be necessary and the OS and applications should auto-provision on their own. Reality is they do not and the storage problem needs to be solved now. Thin Provisioning is a powerful tool in controlling the cost and growth of storage, combined with an integrated data movement technology and an archive strategy it allows for permanent control over primary storage.
Monday, January 5, 2009
There is much attention being paid to primary storage optimization, compression, deduplication and archiving, but none of the techniques will work on one of the biggest consumers of primary storage: allocated but unused storage. Various studies are available that indicate as much as 75% of the capacity in a medium sized and larger data center is allocated but not written. Of the remaining 25% that is in use, 80% of that capacity and its data – about 20% of the capacity in a medium-sized and large data center -- has not been accessed in the last 90 days and is eligible for some form of data reduction technique.