The Problem


Similar to the problem we discussed in our article on "Converting from Fat Volumes to Thin Provisioning", thin provisioning works well on new volumes and new data, but whether you are converting old volumes to thin volumes or trying to keep current volumes thin, most storage system manufacturers have not advanced their technology to address the challenges of transient data—data written and deleted within a short period of time. Keeping thin volumes thin is particularly problematic and for this class of applications, means thin provisioning offers little value.


For example, at some point in the life-cycle of a file system there is likely to be an ongoing deletion of data. This can just be from normal workflow or from a specific archive decision to move older data to less expensive media. In either case a problem that arises is that a typical file system does not actually delete data when it is removed, instead it only designates that the data as eligible to be overwritten. The typical storage system has no awareness that the space being used by the file system has now been released. As a result, the corresponding storage volume remains “over provisioned”. 


The impact of this is that volumes which have highly transient data, even if that transition happens infrequently, quickly negate the capacity efficient advantages of a thin volume. Thus far, such applications have been excluded from taking advantage of thin provisioning. In order to circumvent this issue, storage system suppliers need to develop a handshake with file systems to gain visibility into file deletion operations so that reclamation processes can occur simultaneously. Essentially, file system and storage manufacturers need to co-develop open API’s which enable active communications between their respective platforms.


Once an open standard is established for supporting communications between file systems and storage platforms, broader utilization of this capability can take place. It is important to note, however, that with integration and automation between file systems and thin provisioning-capable arrays, some use cases will require a significant amount of processing from the storage system's software to determine space reclamation opportunities. Storage system manufacturers will need to develop a way to address this potential issue by limiting when reclamation can happen or developing custom technology to handle the load. For example 3PAR has moved thin provisioning functions to customized silicon, will allow for more aggressive and continuous reclamation processes, and hence, greater storage efficiencies, while eliminating limiting the actual performance impact on the storage subsystem.


Symantec and 3PAR have demonstrated a good example of active collaboration towards keeping volumes thin. These two companies partnered to develop a thin API which will enable dynamic reclamation by supporting effective file system to storage system communication. Part of the communication will entail file system tracking of the block level storage maps being presented by the disk subsystem. Due to this communication Veritas Storage Foundation, on a periodic basis, can communicate back to the 3PAR array modifications to the file system as files are marked for deletion. The Symantec Thin Reclamation API uses industry standard commands to communicate with thin volumes in the array to automate capacity reclamation. Both companies are active in the standards body and are working with other industry partners on this. Currently, this API is aligned with one of the T10 proposals and effectively complements it (in a standards compliant way) to provide a scalable, manageable, automated solution that will auto-tune to hardware platforms that support the standard.


Maintaining thin capability does not result in a set of proprietary enhancements that need to be separately maintained by the vendors as they upgrade their systems. It is accomplished through a series of industry standard communications via SCSI.


The communication process first starts when the file system determines if it is hosted on a thinly provisioned volume. Volume labeling allows the storage manufacturer to identify a thick or thin volume, and then reclaims capacity with the Thin Reclamation API leveraging a standard SCSI command.


After the initial handshake, the file system will communicate to the array, on a periodic basis, which blocks on the file system contain deleted data so the array can reclaim that space. The array then “reclaims” those block ranges; in the case of 3PAR this can be done with 16 KB per page granularity. At this point the blocks are available for reuse without consuming additional capacity from the storage pool.


This capability alone makes supporting transient data applications with thin provisioning now possible. Traditionally transient data applications were committed to a fat volume strategy. Now, as the result of a stay thin feature, deleted data results in newly released capacity, which is instantly made available for reuse. Consequently, storage managers can extend the benefits of reclamation to a new class of applications and leverage freed capacity to any host in the data center. These capabilities make an investment in storage systems with a truly stay thin capability more easily justifiable, since it helps address more than just the immediate capacity challenges of primary storage.


Today the stay thin capability is available with file systems that leverage Symantec Storage Foundation 5.0 on UNIX and Linux.


As the API standard becomes ratified, other file system vendors are expected to add this capability as well.


Thin provisioning is yet another function that has to be managed by the storage array to be truly effective. The value of companies like 3PAR migrating thin processing to dedicated silicon within the array, allows for thin applications to be enabled without degrading performance within the disk subsystem. Thin in silicon becomes even more indispensable as thin operations extend beyond the initial tasks of creating thinly provisioned volumes to more compute intensive operations like thin migrations of data and the continuous monitoring of storage for freed up blocks of disk space. If array manufacturers don't design special components to manage all of the advanced functionality possible with thin provisioning, users may find that thin functionality may have to be disabled within the array to ensure adequate performance.


Thin provisioning has evolved to enable IT users to start thin, migrate to thin and most importantly stay thin. Storage systems that have thin “built in” enable users to enjoy all the benefits of thin provisioning without suffering a performance trade-off.