The OpenStorage movement now achieves enterprise class capabilities thanks to the use of the extremely proven Solaris kernel and the revolutionary and now mature ZFS file system that provides enterprise-class unified storage capabilities by leveraging today’s high performance industry standard hardware. While many users such as Wikipedia use the ZFS file system as a part of OpenSolaris, the challenge for broad adoption has been harnessing the raw potential of this storage foundation into total solutions that are truly useful by corporate data centers.


Companies like Nexenta are leading the way by building total solutions that build upon ZFS and OpenSolaris while making it truly OpenStorage by working with their hardware and integrator partners to deliver total solutions that run on industry standard hardware, as opposed to the Sun hardware that customers are required to purchase when buying ZFS based storage from Sun.


These companies can expand the scope of ZFS beyond a mostly Sun-supported effort by developing a broad ecosystem of open source projects, system integrator partners, top tier support vendors, and considerable in house development and engineering capabilities that have broken new ground particularly in the management of storage for virtualized environments.


ZFS is still at the core, however, and it’s important to understand what ZFS is and what it can do.


The Challenge with current File Systems


10 years ago the focus was on how to optimize storage for efficient database performance. While that is still an objective, attention has turned to the greater challenge of unstructured data, and the millions of files that are created every day. As a result of unstructured data, most filesystems are growing at an exponential rate and the cost to keep up with this expansion, both in terms of CAPEX and OPEX, is putting an immeasurable strain on an already thin data center. A new paradigm is needed to manage this growth.


The CAPEX perspective of this is obvious, without storage efficiency improvements from technologies like compression, thin provisioning, automated data placement and eventually deduplication, the costs to power and cool this storage are going to overwhelm the data center. Furthermore, there is an unsustainable chasm between the price raw storage and processor power is available for from manufacturers and the price charged for performance and capacity by the dominant storage vendors; this chasm is largely due to the customer lock-in enforced by the dominant vendors’ technologies and business models.


More subtle is the effect that these rapidly growing filesystems have on OPEX and their storage managers.


Managing a rapidly expanding environment with a proliferation of file based data  requires adding volumes, migrating users or growing and shrinking volumes. And yet today’s dominant file systems have limits on file size and on the number of items in a volume that are all too often encountered. When these limitations are faced users must either use an additional layer of volume management software to bridge these limitations or need to manually reconfigure their storage. In either case CAPEX and OPEX increases. Most file systems today were designed 10-15 years ago, when the amount of data stored was much less and the processors used to address and manage that data were 800-900 less powerful than today’s processors. Compromises that were reasonable at the time due to limited requirements and processor power constraints are now costing enterprises countless time and money.


An increasingly important driver of the growth of file based data is the shift towards virtualized server environments. After all, a virtual machine is a file with a set of associated storage. While it is possible to provide storage for virtualized environments via block level protocols, doing so in the face of the accelerating proliferation of virtual machines promises to further overwhelm already stretched storage administrators. As a result, there is a clear trend towards using NFS for VMware and other storage.


Finally, the need to manage separate protocols (iSCSI, Fibre, NFS, CIFS) across differing storage platforms puts pressure on both CAPEX and OPEX. Managing these storage protocols separately requires additional switching infrastructures, additional storage and of course, personnel to manage these different silos.


Modernizing the Filesystem


Most filesystems in use today were created long before we had shared storage, TB-sized drives, 10Gb networks and of course, server virtualization. What’s needed is a modern day filesystem for modern data storage challenges. A growing number of companies believe that filesystem is ZFS. Here's why:


Unified Storage


ZFS can support NFS, CIFS, iSCSI. Companies can take ZFS and extend it. For example some companies now offer support for Fibre Channel protocols. What this means is that the storage manager can select one platform of storage without concern for which protocol to use if the original protocol changes or if another one is added later. Most storage projects start out with a specific protocol choice in mind but eventually, need a new one. This is one of the causes for so many different storage platforms in the data center. For example, a project can start out as CIFS for Windows file sharing, later add fibre channel for clustered Exchange support and then still have the requirement to add NFS for easier VMware deployment. This can now all be done from a single storage platform.


Data Management Flexibility


ZFS can provide unlimited snapshots and built-in RAID support with improved versions of Raid 5 and Raid 6. Unlike most traditional filesystems that have a hard limit on snapshots, meaning users must be mindful of snapshot reserve space, ZFS enables enough snapshots to provide a rollback point to the second, if needed. The way the ZFS filesystem is designed, it actually performs better with snapshots turned on.


The snapshot feature can be extended for use in a DR environment for asynchronous mirroring. Nexenta has added synchronous mirroring over IP which complements the asynchronous mirroring capabilities of ZFS and also search capabilities for retrieval of snapshots.


ZFS also provides both thin provisioning and storage pooling. Gone are the days of creating LUNs and carving up volumes. A storage pool is assigned to a group of NexentaStor servers with capacity minimums and maximums set. Then storage is allocated as it is needed.


This pooling of storage does not have to be on the same class of hardware. It can be Solid State Disk (SSD), SAS, Fibre or SATA, and it can all be tightly integrated in the data path. Additionally, data blocks can be automatically moved between tiers of storage as they age. This allows you to have your most active set of data on SSD while keeping your aging data on a less-expensive tier. With the rapidly dropping cost of flash-based SSDs this technology may quickly render Fibre channel and SAS obsolete.


For example, there are storage server products available from providers like Intel and Xyratex that will have 48 storage bays as well as PCI-E slots. Imagine using one of these systems, installing 48 - 1TB SATA drives, and then installing one of Texas Memory Systems' PCI-E SSD Cards, with 480GBs of data, for less than $15,000. This would allow a system that delivers the most active data set almost instantly, yet could store years’ worth of information cost-effectively.


Storage Optimization


Part of ZFS’s storage optimization comes from the above-described storage pools and thin provisioning. But going further, a ZFS volume can also be compressed. Although deduplication won't be available for a few months, a case can be made that compression is far more valuable on primary storage. For deduplication to have value, there must be duplicate data to begin with. For most primary storage tiers, that’s usually not the case. Compression, however, works on all data types, duplicate or unique, and in many cases, represents an overall space savings. Finally, compression of data, unlike deduplication, often causes minimal performance loss, making its use on real-time systems acceptable.


An exception to the deduplication payoff is when the storage is used for virtual machine images. ZFS however, has a potentially better way to optimize this storage and it comes without a performance impact - clones. Clones allow a snapshot area to be mounted and re-used in a read / write fashion. While this has value in test / dev environments, where it can really shine is in dealing with virtual server images. Instead of storing the same master image 100 times over, store it once, clone it and mount those clones to their respective virtual machines. The net result can be a significant increase in utilization.


For example, Nexenta has developed ‘VM Data Center’, which includes the ability to use a VM template to provision hundreds of identical VMs in under a minute, a feat that is not possible in non ZFS based solutions, including within VMware’s vSphere itself.   And those identical VMs take little more than the space of a single VM. 


Data Availability


Finally, even the most feature-rich filesystem would be useless without a way to make data highly available. Once again, ZFS delivers a strong message and companies like Nexenta have enhanced it even further. It’s also more important than ever that something changes at the file system level. Uncorrectable Bit Error Rates, have stayed roughly constant, with an bad sector occurring as frequently as once every 8TBs. This is a problem for the storage manager because, while the UBER has stayed static, the capacity per drive and the number of drives in the data center has grown rapidly.  As a result, both silent (bit error bugs) and noisy (caused by tightly packed drive configurations) corruption have become more prevalent.


ZFS uses checksums stored in the parent block pointer of the data set to provide validation of the entire I/O path, protecting from both silent and noisy faults. This also protects the data from a host of potential drive-level issues, such as bit rot, phantom writes, misdirected reads and writes, DMA parity errors, driver bugs and accidental overwrite.


Finally ZFS has Self-Healing Mirrors and RAID-Z. Self Healing mirrors allow that if there is a corrupt block when reading from a mirror, it will validate that the second copy of that mirror is good and if it’s a re-write of that good copy it’s sent to the original mirror, automatically keeping data readable.


In RAID-Z, ZFS uses a dynamic stripe width to maximize drive utilization as it makes sense for each data block. Plus, all stripes are full strip writes, which eliminates the need for read-modify writes, as well as NVRAM (used by NetApp's Data ONTAP),, keeping performance high and cost low. It can be both single or double parity (RAID 6) in write protection, which improves resiliency as it detects and corrects silent data corruption with checksums driving combinatorial reconstruction. At a high level, what has happened is that putting special purpose processors, RAID controllers, close to the disk is no longer useful now that processors are hundreds of times more powerful than they were when RAID was developed and an air tight software RAID solution plus true end to end data integrity checks is available as a part of ZFS. 


In short, ZFS makes inexpensive disks safe and, combined with SSD PCI-E cards described above, can deliver optimal performance. Companies like Nexenta, that are building storage applications on the foundation of ZFS, gain a distinct advantage by not having to develop and test every aspect of the file system minutia. They, instead, can focus on extending the foundation to make it more usable to a broad cross-section of data centers. 


With storage demand booming, especially for file based storage, a focus on the manageability of the potentially high performance ZFS based NexentaStor is paying off for Nexenta with rapid customer and partner adoption.

George Crump, Senior Analyst