Using SATA storage as an intermediary backup target prior to moving the data to tape for long term storage is typically one the first considerations for this type of drive technology. However, even at SATA's reduced cost and increased per-drive capacity, when compared to fibre or SAS drives, it is still very expensive to configure a system to originally store several months,- if not years - worth of backups. This reality became a sticking point in the early days of disk backup and slowed its adoption.


To improve the adoption rate and the value of SATA disk backup, technologies were developed to increase capacities, most commonly data deduplication and compression. Data deduplication analyzes data as it is being stored and only stores unique segments of data. Since most backups, especially fulls, include highly redundant segments, a lot of capacity can be saved. Compression works on all data and typically yields a 50% or greater increase in storage efficiency. The combined two technologies often yield a 12X reduction in required capacity and, as a result, make SATA more viable for medium-term storage of backed up data. When these two technologies were added to SATA arrays, it resulted into the creation of disk backup appliances.


Disk backup appliances that could compress and deduplicate data lead to the broader adoption of disk as a backup target. They allowed for the storage of months worth of backup data and eventually added the ability to replicate data to an off-site facility. The challenge, though, was that all the capacity the organization thought was required had to be known and paid for upfront. If the organization grew faster than estimated, a whole new system would typically have to be purchased. Another challenge was that this deduplication and replication happened without the backup software's knowledge of what was going on. In most cases planning and special steps had to be taken to be able to use the replicated data in the remote site.


The data deduplication and compression technologies that were once the sole domain of backup appliances are now being added to most backup softwares. At the same time a growing number of backup applications are gaining the ability to directly support cloud storage via those organizations’ APIs. By leveraging these two capabilities organizations can minimize the amount of data that goes across both their internal and external networks, as well as automatically replicate the data to an offsite facility. While data deduplication appliances allowed data centers to store more data on disk and to replicate that data off-site, they did not, other than reduced capacity requirements, lower the investment cost for the backup infrastructure. In fact, in most cases it increased that cost. Backup software that enables data reduction technologies and supports cloud storage as a backup target can lower both operational costs and capital expenditures.


At an operational level, using cloud storage as a backup target, with a backup application that can integrate and leverage deduplication, should reduce the required amount of time spent on system management. This is because, as stated earlier, without software integration the deduplication appliance needs to be managed independently from the backup application. In fact, the backup software may not even know that the SATA-based backup appliance is even replicating its jobs, and special reconfiguration may be required to actually use them in the event of a disaster. On the other hand, a software application that has this integration is in total control and knows what backup data sets are where and when to leverage the most appropriate data set for a given restore requirement. This software is optimized for transferring data to a cloud storage provider and may even leverage some of the cloud storage provider’s capacities as we discuss in our recent article on how ISV's can leverage the cloud.


Cloud storage as a backup target may represent significant cost savings. There is no need to purchase an on-site appliance that requires upfront costs. Data can be allocated by the cloud storage provider as needed. This also means that no equipment maintenance makes room for big cost savings opportunities. There also may be the expense of a local disk cache to hold a full backup and an appropriate number of incremental backups. But that cost should be minimal and, in many cases, a RAID setup inside the backup server may suffice.


Using the cloud for a disk backup target means there is also no need to purchase and maintain a second backup target appliance at the remote site. Most disk backup systems require a storage device at each location plus a larger repository, where all the backup jobs may replicate too. In fact, there may not even be the need for a remote facility. Cloud storage providers like Iron Mountain, as an example, can house the entire data recovery operation within their facilities, reducing costs even further.


At a functional level, in most cases, a smaller SATA array is leveraged with cloud storage to be used as a weekly cache of backup data. This becomes the recovery target for full system rebuilds of the latest copy of data. But in most cases, as soon as the next backup job is run, the prior job or two becomes obsolete and no longer used for a full system recovery. Future needs for the job are almost always going to be from the perspective of a specific file or group of files, driven largely by a specific need like a legal discovery or restart of a similar project. These smaller, more specific transfers are ideally suited for delivery via a cloud storage provider.


Selecting cloud storage providers like Iron Mountain instead of using local SATA-based arrays for backup storage can reduce both capital and operational costs as well as reduce some of the management burden.

Iron Mountain is a client of Storage Switzerland

George Crump, Senior Analyst

- The Backup Consideration