Whereas deduplication technologies deliver tangible benefits to traditional enterprise storage applications, imaging and video repositories have little data redundancy, rendering these solutions useless. To solve the problem, some organizations have explored other lossy and lossless compression solutions, which include consumer grade compression applications, as well as solutions that compress files into proprietary formats on disk. For purveyors of large image collections these solutions can cause specific challenges. Point software solutions can present integration, scalability, and manageability challenges. Proprietary compression requires that when data leaves the confines of the "system" that it be re-inflated for transport or that files be decoded at an application server or at the end-user PC. Inside the confines of a single data center this is rarely a problem because rehydration remains transparent to the application and users utilizing that data. However, for large image collections that are accessed by many users in multiple facilities or via the Internet, trying to place a reader type of application at each endpoint is almost impossible.


As a result these organizations have had mixed results with these applications. Instead they have found the realistic solution is to continue to scale the storage environment despite the problems that entails. What is needed for these large image repositories is a compression solution that keeps the data in its native format so that no special reader application will be required.


In the past the only way to reduce the size of the data footprint for an image was to reduce the quality of that image, known as lossy compression. This was a technique often raises concerns due to visible quality degradation of the images. An alternative method is emerging that may appease both the storage managers and the users of large image collections; one that is still lossy at its core but compresses in a way that it is almost impossible for the human eye to notice the difference. In essence a visually lossless image reduction that we can describe as native format optimization (NFO).


Companies like Ocarina Networks are bringing capabilities like NFO to market which allows for the file to be reduced in size in a visually lossless manor. As is the case with lossy image reduction, the size of the image is reduced by reducing its quality. However that quality reduction is done outside of the scope of the human eye’s ability to detect the difference. Even though this method can achieve savings of 50% or more, the detection of quality degradation and compression artifacts is extremely difficult, even when comparing to the original by going pixel by pixel with a magnifying glass.


The goal being that the data set can be reduced by 30% to 50% while the individual images suffer no visual image quality loss. This allows for an end-to-end space savings. The impact then is not only a reduction in the storage capacity of the images themselves but there is also a savings on bandwidth needs for data transfer as well as a savings when backing up or replicating that data. Bandwidth is of particular concern, especially for photo sharing sites, which are often spending more on connectivity than they are on storage.


In addition to the benefits of reduction in storage capacity and recurring bandwidth costs, there are also distinct performance benefits from NFO. First because the data is in its native format the user of that data does not have to wait while some form of an inflate process occurs. Second since the size of the data is smaller it takes less time for the image to load up for viewing of processing. Faster download times for end-users directly improves site responsiveness and usability, which are critical to end-user loyalty.


How is this accomplished? As stated earlier NFO uses visually lossless compression to reduce the size of the image without affecting the perceived, by human eye standards, quality of that image. NFO can take advantage of the fact that the human eye is less sensitive to high frequency color variations. It is also more sensitive to brightness variation than color variation. Also the eye is programmed to pay more attention to certain portions of an image. This means that objects in the background could be stored at a lower resolution with no impact. The eye is also more sensitive to motion than texture so the detail of texture can be reduced. Finally the eye perceives an average image quality base mostly on the lowest quality baseline so the image quality can be baselined against that and further reduced. Video provides even more opportunities for optimization in the areas of motion compensation, keyframe placement, and other optimizations that align the targeted media with the sensitivities of the human visual system.


NFO is the second prong in the two-pronged attack on image capacity requirements. The first is using technologies like Ocarina Networks content aware deduplication and compression. This allows for specific optimizers to extract maximum lossless data reduction from image collections and is ideal for data within or between data centers. Adding the option to also optimize data in a native format when the distribution costs of that data become significant is the ideal second prong. Storage managers can apply specific data optimization policies based on their use case.

George Crump, Senior Analyst

An Alternative to Deduplication for Large Image Collections

This Article Sponsored by Ocarina Networks