Albireo is Permabit’s high performance, data optimization software that’s driving deduplication’s move into primary storage. Albireo’s sub-file process operates out of the data read path using content-aware segmentation and efficient indexing technologies to enable zero impact performance for users, even as storage scales into the petabytes. As an OEM-embedded solution, Albireo is being integrated into a number of tier-one primary storage vendors’ storage systems, including BlueArc, LSI and Xiotech Corporation.



Big performance


In recent testing, Albireo delivered performance of almost 6GB/s for a single node and over 77GB/s in a multi-node cluster (using a 64 KB chunk size and hardware SHA-256 hashing). That’s throughput that the system sustained as capacity of the node grew to its limit. The way they accomplished this is through some content-aware intelligence which optimizes the deduplication process for specific application data types and a fundamentally better indexing architecture.


Index efficiency is the cornerstone of deduplication performance. Typically, the index or hash table grows as unique blocks are identified and their hash keys are created and stored. For traditional deduplication systems, this index has to stay entirely in RAM in order to maintain adequate performance. This means that most deduplication solutions are effectively bounded, both in performance and scale, by the capacity of system memory. Albireo’s Delta Indexing technology compresses this index data and allows portions of it to be efficiently written to disk, reducing the memory consumption of each key to under 4 bytes. This allows each GB of system RAM to represent 1TB of input data (assuming a 4KB data block size) while avoiding disk access in nearly all cases, yielding an average lookup latency of under 10 microseconds. Further efficiency is gained through two Permabit technologies called Sparse Indexing and LRU Discard, which together reduce memory consumption by another 40x. The resulting index efficiency is dramatic, with each 1 GB of RAM in an Albireo system indexing 40TB of input data at a 4KB block size, or 640TB using a 64KB block size.


Deduplication is implemented differently by various manufacturers. ‘Inline’ deduplication parses data into blocks as it’s ingested, creating hash keys and making comparisons with existing keys in real-time. This implementation may be most directly impacted by the size of available RAM storage. ‘Post-process’ deduplication also parses data into blocks but, stores them in a cache ‘landing zone’ in real-time, an then runs the deduplication process as a second step. This method has some advantages in terms of shortening the effective ‘dedupe window,’ since host servers are ‘released’ from the deduplication process as soon as data’s cached. ‘Parallel’ deduplication combines these two methods, caching the input data blocks but, starting the deduplication process on them immediately. As a software solution, Albireo is integrated into OEM systems which use any of these deduplication techniques.



Scalable performance


Primary storage scales, and in order for deduplication to be an integral part of that storage system, it too must scale, maintaining performance in the process. Albireo’s new GX technology allows it to run in a clustered configuration, up to 16 nodes, and maintain performance as it grows. The chart below shows the results of an Albireo system tested in configurations of 1 to 16 nodes, running hardware SHA-256 hashing and using a 64KB block size. Performance is linear as the system scales to over 79,000 MB/s.

Briefing Report

Eric Slack, Senior Analyst

Permabit is a client of Storage Switzerland

Zero Impact performance


The Albireo technology doesn’t require the deduplication process to be run in reverse, or ‘rehydrate’ the data as it’s being read, like other solutions. This allows the deduplication engine to sit completely out of the read path, unhooking deduplication performance from read performance. Consequently, data conditions that can cause user performance degradation on traditional deduplication engines don’t affect Albireo users. This means that while a heavily loaded system can reduce the deduplication ratios, it won’t slow down users’ performance.


Implementation of deduplication technology into a primary storage environment is also different than it is with backup. The real-time component of primary storage requires the deduplication engine be integrated into the storage controller itself, not as an appliance. The Albireo Scalable Data Reduction architecture is a software solution that’s integrated into OEM storage systems to provide data optimization and effective data reduction in a primary storage environment. Regardless of the deployment option, inline, post-process or parallel deduplication, Albireo is out of the read path and as a software solution that’s integrated into an OEM storage system, it doesn’t write or alter data written to disk. This allows the OEM to maintain control over the data write path and assure data integrity. For more information on data the integrity of deduplication, see the Storage Switzerland article “Making Primary Storage Deduplication Safe”.



Storage Switzerland’s Take


It took a little out-of-the-box thinking, or more accurately, “out of the read path” thinking, but Albireo has made deduplication ready for prime time and primary storage. Permabit did this by removing the performance impact of deduplication through a couple of innovations. First, they hammered out a number of technology improvements with the fundamental efficiency of the indexing process. But second, they recognized that the issue wasn’t so much the speed of the deduplication process itself, but the speed of the storage that was using that deduplication. This uncoupled storage performance (specifically, read performance) from deduplication performance and as a software implementation, Albireo has enabled storage OEMs to add deduplication to primary storage systems.