Systems like those provided by Permabit Technology, for example, have tackled the scale requirement by building a cluster of storage nodes. When more capacity is needed, simply add another node. The system automatically recognizes the new node and starts using the new capacity. These nodes also provide a redundant architecture that’s leveraged to maintain data availability. An individual node or even two nodes (simultaneously) can fail without any data loss or loss of access to that data.


Secondly, the system can leverage the clustered storage architecture to provide an advanced form of data protection. This is critical because as drive capacities continue to expand, especially with 2 TB drives now coming into the mainstream, traditional RAID 5 and even RAID 6 configurations begin to reach their practical limits for system recovery time without data loss or corruption.


The challenge with traditional RAID is the time it takes to return a storage system to a fully working and redundantly protected state. If a drive fails, most RAID technologies will start a rebuild process as soon as a global spare can be identified or a failed drive replaced. With high capacity 1TB and 2TB drives, this rebuild can take many hours and in some cases the rebuild time can approach days. During this rebuild time the archive data, the organization’s only copy of that data, is totally exposed. If a second drive fails, that data may be lost forever because the probability of a read failure during rebuild time increases dramatically. Even though RAID 6 provides some level of protection, with extremely long rebuild times, the chances of a third drive failure occurring before this process completes increases because typically there are more drives involved due to larger drive configurations. While the chances of that happening may be relatively low, just the risk of 100% data loss of this sole copy of data has to be concerning. A more effective and fail safe protection is needed.


The archive system can address this challenge in one of two ways; either through a mirror or by using an advanced form of RAID. In smaller implementations, mirroring is a simple alternative. While there is a higher ‘capacity cost’ with a mirror’s second copy, the redundancy allows for rapid recovery. For most smaller implementations, the initial size of the archive more than compensates for the loss of capacity caused by the mirror.


In larger environments, the mirror’s one-for-one copy of all the data can become cost prohibitive and is a reason why RAID is often chosen instead. However, RAID doesn’t address the data risk issue, as simply rebuilding RAID sets when errors are detected is not an effective way to increase data integrity. The alternative is to use an advanced form of RAID. Permabit, for example, uses a technology called RAIN-EC (Redundant Array of Independent Nodes) that breaks data into multiple chunks and distributes them across separate drives located on separate storage nodes. If a node fails, the remaining chunks can be assembled and present that data. In fact 2 nodes (each node today contains 4 drives) could fail and the data would still be intact. The effect is a much more robust protection algorithm than the parity offered by RAID that delivers greater redundancy and faster recoveries.


While failure of nodes and drives is fairly obvious, what may be more concerning is a ‘silent data loss’ situation. In this scenario, a drive could degrade to the point where it hasn’t actually failed, but data on the drive has been corrupted. How can this type of corruption be detected? With traditional systems the only way to confirm a corruption has occurred is when that data is read. If there are multiple copies of this data spread out on disk and tape then recovery may be possible from one of those devices. But having redundant copies of data defeats the purpose of the archive in the first place - as well as adding cost.


An archive system can eliminate this concern by leveraging another technology that’s very well known, just not for data protection. Deduplication is another design decision that archive systems can use to optimize storage capacity. The deduplication algorithm generates a signature for each segment of data that is written to it. This signature is unique to that data segment. If the signature appears again, then the second copy of the data is not written to disk, instead a reference is made to the original signature and space is saved.


The archive can use this signature to protect the data it’s storing as well as to optimize space. Archive systems can leverage this signature information to verify the data contained on disk. Periodically, the system will rerun the algorithm on the data segments that it has stored. That signature should be the same every time the algorithm is run. If not there has been some sort of corruption. Because the system has the ability to regenerate data from its unique RAIN protection strategy or because of a mirror, the data corruption can be ‘repaired’ and the information can be salvaged.


Replication is a cornerstone need when creating a fully optimized archive strategy. It provides a second but managed copy of this, ‘original copy’ of data. It protects not only against a site failure, but also against some other form of data loss even if all the other protection steps fail. Still, this is a single managed copy so that it will adhere to the same data retention strategy as the original archive copy.


Deduplication is again leveraged to enable a WAN-efficient data replication strategy. In this form of replication only the changed blocks that are unique to the target system at the DR site are transferred, even if the source data is coming from multiple sites. For example, three sites could be replicating to a single DR site. When one of the primary sites prepares to send its recently changed or added data, if some of the data already exists at the remote site, the data is not sent. This method not only reduces the amount of data that has to be stored at a DR site, it also reduces the amount of WAN bandwidth required.


For disk archiving to deliver on the promise of reducing storage and protection costs, customers have to feel confident in its ability to securely and reliably house a single copy of data. Protecting the integrity of archive data long term is critical to making sure that this copy is not lost.

George Crump, Senior Analyst

This Article Sponsored by Permabit