Performance vs Capacity

Capacity and performance are the two primary characteristics of a storage system and the things which drive upgrades and replacement. Capacity is pretty straight forward, when you’re full it’s time to buy more storage. Performance is a measure of how fast a system can accept or produce a given data set or file and is typically reported as “transfer rate” and “I/Os per second” or simply “IOPS”. But it’s just as important as capacity, since a system with available space that can’t read or write data in an acceptable timeframe is as useful as a storage system that’s full . But compared with capacity, performance is a more difficult metric to determine, since it’s a function of time, and users or applications don’t access data at a constant rate. This makes performance a moving target and trying to determine when additional performance is needed, more difficult. The first step is to understand how a storage system handles read and write operations.

A storage system, whether it’s a single disk drive or a large array, also includes a controller which manages the I/O process, and a storage ‘staging area’, or cache. These are typically comprised of very fast (and expensive) RAM devices, which can hold frequently accessed data objects, eliminating the requirement to access the disk drive when reading these data. Both the speed of the controller and the size of the cache can affect the observed performance of the storage system. For the sake of this discussion, we’re going to ignore the latency added by the controller and assume that reads and writes are coming from disk, not from cache.

Reading a file from a disk drive system involves a number of steps, each of which adds to the time required to accomplish this output operation. For most systems, this time is comprised of “seek time” and “transfer time”. For simplicity, we’ll assume seek time includes the time required to move the heads to the desired cylinder or track on the platter, including rotational delay. Transfer time is the time required to actually read data from contiguous locations on the disk. In other words, seek time will describe how long it takes to find the physical blocks of the file that needs to be read and transfer time describes how long it takes to actually move the bytes of data. This means that each read I/O operation includes a fixed amount of transfer time, based on the size of the file, and a variable amount of seek time, based on how spread out the file is on the platters of the disk drives used to store the file.

Writes are a little different since most disk arrays include RAID, which essentially spreads writes out across several disk spindles, each constituting another disk write operation. For this reason, writes are slower than reads and most systems include specs for types of I/O, read performance and write performance, or a combined spec with a notation about the distribution (%) of reads and writes they used. For most storage activity, reads typically outnumber writes, usually by a wide margin. Regardless,   since most of these reads (and writes) are of files that are either changing on a regular basis and/or not extremely large, IOPS are the dominant performance spec, not transfer time.

Since IOPS are very dependent on data type, applications, etc, they’re not always readily provided by storage manufacturers. But users should know how to calculate a meaningful IOPS number for a given storage system, and understand how to determine the IOPS ‘appetite’ their most critical applications can bring. In the next article, we’ll examine IOPS in more detail and look at the concept of “workloads” and how they impact storage management.

Eric Slack, Senior Analyst

Analyst Article

and why should you care?