What exactly is DRP

Hello friends - better late than never. There has been a lot of excitement about using Data Reduction Pools (DRP). Today I would like to take a moment to provide a semi-brief explanation of what DRP is and how exactly it works. DRP is designed to take full advantage of various data reduction technologies. This is achieved by re-designing how data in pools are managed and maintained. Lets first discuss what makes DRP different than the "legacy" standard pools.

DRP Volumes

Journal Volume 

The Journal volume works similar to a filesystem journal in that it can be used to recover the pool in disastrous scenarios such as a T3 Recovery.  This volume also only processes sequential writes, and is only read from to perform recovery actions. This volume is hidden to the user.

Customer Data Volume

The Customer Data Volume consumes ~98% of the pool capacity and stores all data that is written into the pool. All writes to this volume will be sequential 256K i/o reads out of this volume will typically be random. The reason for this is all write i/o is aggregated into 256K chucks and destaged, while all i/o gets addressed at 8K so we are capable of servicing small/random reads without having to read in excessive (unneeded) data. If data in this volume gets over-written, data is not actually over-written. Instead what happens is the original data is read in, the over-write data is laid on top of that data and it is destaged to a new extent. The original data area is then flagged as garbage in the Reverse Lookup Volume. This volume is hidden to the user.

Reverse Lookup Volume

The Reverse Lookup Volume is where a lot of the magic happens. This volume keeps track of extent capacity usage and, more importantly, the holes that are created by deletions and over-writes. A Garbage Collection process runs that references data in this volume to make decisions about which extents can and should be cleaned up. This volume is hidden to the user

Directory Volume 

Directory volumes hold metadata to track user data written into the pool and where in the Customer Data Volume this data is actually stored. Directory volumes are the volumes you actually see in lsvdisk and in the volumes section of the GUI. Additionally, performance metrics for volumes are tied to the individual directory volumes for ease of troubleshooting.

Now that that is out of the way, lets discuss some of the operations running in the DRP and hos I/O moves through a DRP (from upper to lower cache).

DRP IO Flow and Maintenance

Compressed Writes

DRP compression is NOT the same as IBM Real Time Compression (RACE / RtC). Instead of dealing with a fixed output (and indexed) size of 32kb, DRP compression uses a fixed input size for i/o. This provides more consistent performance and makes it easier to handle random read requests. We then aggregate the compressed chunks (normally ~4k after compression) to a 256k i/o to destage down through lower cache to the mdisks. For simplicity, here is a diagram of compressed writes in DRP shamelessly stolen from the redbook.

Deduplicated Writes

Deduplication takes input 8k chunks and deduplicates them... This is done by taking a hash of the input data that is written and saving it in a metadata tree. When a new write comes in, if the 8k chunk that had been previously written has a matching hash to the new data stream, the new chunk simply points to the data previously written. For simplicity, here is a diagram shamelessly stolen again:


Reads are pretty simple. Data is indexed in 8k chunks. When a read request comes in, we request only the read data blocks requested in increments of 8k - there is no reading of unneeded space like in RACE compression (from poor temporal locality). Depending on the read size, the cache-prefetch algorithm may additionally read some extra space in upper cache in the surrounding lba's for the data requested.


When data blocks are over-written in a DRP, we don't simply replace the data where it was before. Instead we perform a read-modify-write operation (just like flash drives). The original data is read, and then the changes are calculated and the result is written to a new grain in the pool. The original data is then marked as garbage.


DRP supports a full end-to-end unmap including the ability to reclaim real capacity in the pool - which is the base improvement over standard pools in regards to this feature. Hosts can unmap space and this will be marked as garbage. Then later on, garbage collection will reclaim that space and unmap the affected extent to the backing storage.

Garbage Collection

Garbage Collection takes care of reclaiming all the space that is wasted from over-writes and host unmaps that creates small holes in extents. Using information found in the Reverse Lookup Volume the system finds the extents with the most garbage space. The used data from these extents are read in and moved to an extent with free space. Once emptied, the extents whose data was migrated are then unmapped at the mdisk layer and the extent is free to be reused. Garbage Collection process is lazy by design to minimize the extra iops processed by the system. Garbage Collection will generally allow the pool to reach 85% full before going into 'high gear' to keep up with new and over writes coming in.

Hopefully you all found this informative and helpful in understanding how DRP works. If you have any questions please feel free to comment, follow me on Twitter @fincherjc, or on LinkedIn.


Popular posts from this blog

Why you should always use DRAID

Remote Copy Data Patterns and Partnership Tuning

What is a 1920 and why is it happening?