Why you shouldn't provision 100% of your flash storage

First a quick introduction to flash hardware:
Flash storage (solid state drives, flash cards, flash modules, etc.) is organized into NAND dice. These are made up of planes which are made up of blocks which are made up of pages. Pages are then made up of "words". In the flash products I typically work with, there are typically 256 pages per block, 1024 blocks per plane, and 2 planes per die.

When a flash page needs to be overwritten due to changes in the data, the page is not simply overwritten. Instead what takes place is a read modify write... that is to say the original page (to be changed) is read, the changed data is calculated and the result is then written to a new page. The original page is then marked as garbage:

When enough pages in a block are marked as garbage, a process called garbage collection then moves the valid pages in the block to another block and erases their original block. All flash drives are over-provisioned - or given more than the advertised space to provide capacity for this process to operate in the background without anyone else noticing.

Now getting to the point:

All of this means that when a flash drive gets really full it will have fewer resources to perform over-writes and fewer blocks that could be used for garbage collection. The result is the drive will get slower as it approaches 100% full. If this happens to a single flash drive, typically the drive will be failed (proactively or otherwise) by the storage controller.

However, if there are multiple flash drives in a RAID array, odds are several drives in the array will fill up at the same time (because data gets striped). If this happens, then it is likely that multiple flash drives will start showing performance problems and eventually fail - possibly taking the array offline. It goes without saying that this is something that could be catastrophic as flash is more widely deployed in enterprise environments.

In order to help mitigate against this type of problem, I normally recommend only provisioning 80% of the physical flash that you have in the system. This helps to ensure there are always enough free pages to maintain stable performance on the drive until it reaches its natural end of life. This is pretty simple for traditional storage arrays that don't feature any space saving technologies:

4 drives that are 2TB each in RAID 10 = 4TB X 80% = 3.2TB of space to allocate to hosts.

However, as is the case with the new IBM FlashSystem 9100 and the FS900 AE3 model, this calculation can become quite difficult when it comes to flash controllers that also do compression. Thankfully IBM has documented some of the calculations we can use.

For FS900 AE3, IBM has published some best practices on the subject here.

The FlashSystem 9100 is a bit more complex. If you are using the industry standard drives in the system, then the array level calculation is quite simple. However, you have the option to use IBM's proprietary flash core modules which do compression at the drive level just like the FS900 AE3 - and I would use similar provisioning rules.

If you are unsure what your compression ratio might be, you can IBM's Comprestimator to check your data's compressability.

If you do find yourself in a position where your flash storage system is behind an IBM Spectrum Virtualize system and is out of space, IBM has put out the following support tip full of various remediation strategies based on your configuration.

If you have any questions please feel free to comment, find me on linkedin, or follow me in twitter.


Popular posts from this blog

Why you should always use DRAID

Remote Copy Data Patterns and Partnership Tuning

What is a 1920 and why is it happening?