Why you should always use DRAID

Hello friends - I have been a bit focussed on remote copy lately so this week I would like to shake things up a bit and provide a simple explanation as to why you should always use Distributed RAID over Traditional RAID in Spectrum Virtualize products.

Traditional RAID

This is what we typically refer to as RAID in the storage industry as a whole. We aggregate drives and either stripe data or mirror it across multiple physical drives. In the Spectrum Virtualize platforms we generally support:
  • RAID 0 - Striping data across drives for maximum performance with no redundancy
    • Since all drives hold unique data, we only write once providing best write performance
    • Does not allow for any drives in the array to fail without data loss
  • RAID 1 - Mirroring data across drives (normally 2) to provide redundancy
    • Writes the same data set across a pair of drives requiring data to be written twice
    • Allows for one drive in the mirrored set to fail without losing data
  • RAID 5 - Striping data across drives providing redundancy and also writes a single parity
    • Stripes data across all the drives in the array and writes a single parity bit
    • Allows for a single drive in the array to fail without data loss
  • RAID 6 - Striping data across drives providing redundancy and also writes two parity bits
    • Stripes data across all the drives in the array and writes two parity bits
    • Allows for two drives in the array to fail without data loss
  • RAID 10 - Mirrors data across two RAID 0 sets consisting of half the drives in the array
    • Provides a theoretical redundancy of half the drives in the array as long as two drives in the mirror set that fail do not contain the same data.
    • Since data is only written twice, mathematically this provides better write performance than RAID 5 or 6
These RAID levels all work as they would in most other storage controllers allowing a dedicated spare drive to be used to rebuild the array in the event of a drive failure. Having an adequate number of spare drives in the system is key to keeping uptime as drives will eventually fail. The rebuild operation will read from all remaining member drives in the array and writing to the single spare drive. This means:
  • You will only ever be able to rebuild as fast as a single drive can write in the data
  • One or more drives sit idle in the system taking space without offering other benefit
  • For spinning disks, sitting idle for extended periods may lead to ramp-up failure stopping the spare drive from serving its intended purpose

In comes DRAID

DRAID takes a different approach to organizing data and using spares. As a matter of writing data to the disks, you have the option of DRAID 5 or DRAID 6. These each work just like the standard RAID 5 and 6 in terms of organizing and writing out data. The change here is instead of relying on a spare drive that sits idle in the system, we integrate that drive into the array. To maintain redundancy, we reserve the capacity of one (or more) drives and divide that evenly among all the drives in the array. While the basic redundancy will match RAID 5 or 6 depending on your choice, the array will have a number of 'rebuild areas' which it can use to restore that redundancy the instant a drive fails without involving any drives outside the existing RAID set. This means:
  • Rebuilds are a read from all, write to all operation that is significantly faster than TRAID
  • No idle drives in the system mean you get to use all the drives you paid for
  • No idle drives remove the risk of ramp-up failures
  • For any FlashCore Module arrays, physical capacity is automatically reserved in the form of rebuild areas which may be used in the event you prematurely run out of physical space.
Hopefully these points have helped to sell you on using DRAID for all of your RAID arrays. However, when using DRAID you should be aware that there are two operations that take place to restore redundancy to the array instead of one. First we have the rebuild which takes relatively little time to complete when compare to TRAID. However, when the physical drive that failed is finally replaced we perform what is called a 'copy back' operation. This performs a bit-for-bit copy of the data that was written into the rebuild area to the new drive to allow it to be substituted into the array. This allows us to free-up that rebuild area for use in the event another drive were to fail.

The last bit of information I would like to leave you with is to consider the size of your drives when picking between DRAID 5 and 6. Drives are starting to grow rather large in capacity for all classes of storage and it will take time to write out the data. With drive capacity growing and density being a growing concern in the storage industry, I would recommend to use DRAID 6 for all classes of storage to maintain the highest array-level redundancy possible.

I hope you all found this helpful and informative. If you have any questions or concerns please leave a comment, ask me on Twitter @fincherjc, or reach out to me on Linkedin.


  1. Hi,
    I'm building backup solution using IBM V5000 36 x 12TB 7,2K disks. Will be using DRAID6. Backup Server will be connected using 10Gbit ethernet card. My question is is it enough to have 36 disks to saturate 10Gbit bandwidth?

    thanks for advice

    1. Standard link efficiency is 80%, so your 10Gb port should net a max of 1,000MB/s

      Your standard 7.2k drive assuming a 100% sequential steam of 256k writes should get about 100MB/s (perhaps more on some newer nearline drives that come with drive cache)

      With DRAID, all 36 drives will actively be used so you should be able get about 3600MB/s as a max. In theory, this will saturate 4 of your 10Gb ports if all my assumptions about your workload hold true and no network bottlenecks are in place.

  2. Hi - what would you r advice be on an initial array size and taking into account maybe adding drives in the future as regards to array size? E.G. We are to install a system with an initial 16x 4.8TB FCM's and the default DRAID6 setting would be 10+P+Q. Is it worth forcing the array to be 13+P+Q to get an extra bit of space - bearing in mind we may then add in the further 8x FCM's in the future?
    Many thanks!

    1. In an FCM environment I've noticed that many clients have issues tracking capacity usage as there is lots of abstraction between FCM compression and other data reduction technologies that might get implemented in a given environment that make it difficult to track. Because of this, I personally prefer and recommend deploying full enclosures at a time wherever possible (24 dives at once) over adding drives on demand.

      In terms of specific array settings, the gui is tuned to give the best price/performance/lifespan recommendation based on what is available in the box and would lead with that.


Post a Comment

Popular posts from this blog

Remote Copy Data Patterns and Partnership Tuning

What is a 1920 and why is it happening?