Unlike other forms of storage efficiency that focus on storing more data in the same space with technologies like compression or deduplication, Big Data storage efficiency should focus on making sure that minimal storage capacity is wasted from the actual volumes itself. Essentially making sure there is less free space. Free space can not be optimized by deduplication and compression because there are no files to optimize. It is simply captive storage. The second key efficiency objective is to make sure that the storage system requires minimal IT staff to manage. If additional IT staff has to be added as fast as capacity is added to the Big Data project then the potential gains made from Big Data analysis are lost in the cost of additional staff.


The key to delivering this high level of efficiency is to be able to provide a single volume across the entire storage system. Of course to be able to deliver a single large volume in a Big Data environment means that the storage system will have to be able to support a very large, 1PB+, volume size without suffering a performance loss. Volumes of this size, especially with high file count Big Data, will also have to have advanced meta-data handling capabilities because of the number of files they will manage.


The reality is though that many storage systems and file systems can not support this size of volume nor can they support the number of files that a Big Data Environment would require per volume. The user is then left having to carve up multiple volumes out of existing storage and then through application coding or even manual process be able to perform analytics across those volumes. Instantly complexity is increased and efficiency decreases. It also increases the chances for errors to creep in.


Big Data is a balance between having enough data stored to be able to successfully mine that data and the cost to store that data. The more data that can be stored the more effective and accurate the mining process will be but focus has to be given to keep the cost to store that growing data set under control. With single volume approach like Isilon does with OneFS, users can make sure that captive, unused storage is kept to an absolute minimum as is administration time. In short more analytics can be run across a broader spectrum of data making the Big Data project more profitable and more accurate.

George Crump, Senior Analyst

Blog

Isilon is a client of Storage Switzerland