Why you should always use DRAID

Hello friends - I have been a bit focussed on remote copy lately so this week I would like to shake things up a bit and provide a simple explanation as to why you should always use Distributed RAID over Traditional RAID in Spectrum Virtualize products.

Traditional RAID

This is what we typically refer to as RAID in the storage industry as a whole. We aggregate drives and either stripe data or mirror it across multiple physical drives. In the Spectrum Virtualize platforms we generally support:
  • RAID 0 - Striping data across drives for maximum performance with no redundancy
    • Since all drives hold unique data, we only write once providing best write performance
    • Does not allow for any drives in the array to fail without data loss
  • RAID 1 - Mirroring data across drives (normally 2) to provide redundancy
    • Writes the same data set across a pair of drives requiring data to be written twice
    • Allows for one drive in the mirrored set to fail without losing data
  • RAID 5 - Striping data across drives providing redundancy and also writes a single parity
    • Stripes data across all the drives in the array and writes a single parity bit
    • Allows for a single drive in the array to fail without data loss
  • RAID 6 - Striping data across drives providing redundancy and also writes two parity bits
    • Stripes data across all the drives in the array and writes two parity bits
    • Allows for two drives in the array to fail without data loss
  • RAID 10 - Mirrors data across two RAID 0 sets consisting of half the drives in the array
    • Provides a theoretical redundancy of half the drives in the array as long as two drives in the mirror set that fail do not contain the same data.
    • Since data is only written twice, mathematically this provides better write performance than RAID 5 or 6
These RAID levels all work as they would in most other storage controllers allowing a dedicated spare drive to be used to rebuild the array in the event of a drive failure. Having an adequate number of spare drives in the system is key to keeping uptime as drives will eventually fail. The rebuild operation will read from all remaining member drives in the array and writing to the single spare drive. This means:
  • You will only ever be able to rebuild as fast as a single drive can write in the data
  • One or more drives sit idle in the system taking space without offering other benefit
  • For spinning disks, sitting idle for extended periods may lead to ramp-up failure stopping the spare drive from serving its intended purpose

In comes DRAID

DRAID takes a different approach to organizing data and using spares. As a matter of writing data to the disks, you have the option of DRAID 5 or DRAID 6. These each work just like the standard RAID 5 and 6 in terms of organizing and writing out data. The change here is instead of relying on a spare drive that sits idle in the system, we integrate that drive into the array. To maintain redundancy, we reserve the capacity of one (or more) drives and divide that evenly among all the drives in the array. While the basic redundancy will match RAID 5 or 6 depending on your choice, the array will have a number of 'rebuild areas' which it can use to restore that redundancy the instant a drive fails without involving any drives outside the existing RAID set. This means:
  • Rebuilds are a read from all, write to all operation that is significantly faster than TRAID
  • No idle drives in the system mean you get to use all the drives you paid for
  • No idle drives remove the risk of ramp-up failures
  • For any FlashCore Module arrays, physical capacity is automatically reserved in the form of rebuild areas which may be used in the event you prematurely run out of physical space.
Hopefully these points have helped to sell you on using DRAID for all of your RAID arrays. However, when using DRAID you should be aware that there are two operations that take place to restore redundancy to the array instead of one. First we have the rebuild which takes relatively little time to complete when compare to TRAID. However, when the physical drive that failed is finally replaced we perform what is called a 'copy back' operation. This performs a bit-for-bit copy of the data that was written into the rebuild area to the new drive to allow it to be substituted into the array. This allows us to free-up that rebuild area for use in the event another drive were to fail.

The last bit of information I would like to leave you with is to consider the size of your drives when picking between DRAID 5 and 6. Drives are starting to grow rather large in capacity for all classes of storage and it will take time to write out the data. With drive capacity growing and density being a growing concern in the storage industry, I would recommend to use DRAID 6 for all classes of storage to maintain the highest array-level redundancy possible.

I hope you all found this helpful and informative. If you have any questions or concerns please leave a comment, ask me on Twitter @fincherjc, or reach out to me on Linkedin.

Comments

Popular posts from this blog

What is a 1920 and why is it happening?

Troubleshooting volume performance in IBM Storage Insights