Why every Spectrum Virtualize system should be using NPIV

The IBM Spectrum Virtualize software stack operates in clusters which are based on the Paxos Protocol Therefore, in order to service a host's I/O request, each node must have the current cluster state information. When a node comes back online from a code upgrade (or reboot) it must pull this information from the running cluster. To be able to pull this data, the node must connect to the network so that the data can be transferred. This requires the node to bring its network interfaces online and log into the Fibre Channel (FC) network. Naturally, the consequence of this is the propagation of RSCN (State Change Notifications) which then notifies all the hosts that the ports are back online (prompting login). However, the node will not be able to service SCSI I/O until the current cluster state is obtained from the other SV nodes in the cluster. Since the FC host logins are happening in parallel with the inter-node data transfer,  there is no way that the Spectrum Virtu

Sizing the inter-site link for replication and multi-site environments

Hello friends. I've done a poor job this year of posting content regularly but hope this one helps make up for it. One of the frequently asked questions I get from administrators, technical sellers, and other engineers is how much bandwidth is required to perform replication or to sustain a multi-site environment. For the purposes of this conversation I am going to assume the use of Fibre Channel for the network. The first part of this answer is very easy. Synchronous replication (and synchronous data patterns in the case of Global Mirror) require enough bandwidth to move the peak write throughput of the production workloads. If your peak write rate is 3Gigabytes per second (GB/s) then the math works out as: 3GByte           8bits ----------   X        -------    = 24Gbps, 2X16Gbps links or 3X8Gbps links 1s                    1Byte For each Gbps on a standard port you can only achieve effectively 100MB/s as the standard interface is only capable of about 80% link efficiency on aver

Discussion on queue depth

Hello friends. It has been quite a while and lots of things have changed since my last post. I am hoping to break out at least 1 post per month for the remainder of this year to deliver some helpful information in designing and tuning your Spectrum Virtualize storage systems. For the first post of the year I want to talk about queue depth settings. Historically IBM has put out a formula for calculating queue depth here. However, this calculation is not ideal for the newest systems on the newer code versions. In the current releases the queue depth on the system is a fixed number per physical port on each node. As a result, in order to avoid queue full on the system you should be careful not to set the aggregate queue depth of all hosts accessing a single physical port on the system greater than 2048. Once the target (in our case Spectrum Virtualize) hits queue full, the system won't be able to accept any additional io requests on that port. This can result in dropped commands

Why you should always use DRAID

Hello friends - I have been a bit focussed on remote copy lately so this week I would like to shake things up a bit and provide a simple explanation as to why you should always use Distributed RAID over Traditional RAID in Spectrum Virtualize products. Traditional RAID This is what we typically refer to as RAID in the storage industry as a whole. We aggregate drives and either stripe data or mirror it across multiple physical drives. In the Spectrum Virtualize platforms we generally support: RAID 0 - Striping data across drives for maximum performance with no redundancy Since all drives hold unique data, we only write once providing best write performance Does not allow for any drives in the array to fail without data loss RAID 1 - Mirroring data across drives (normally 2) to provide redundancy Writes the same data set across a pair of drives requiring data to be written twice Allows for one drive in the mirrored set to fail without losing data RAID 5 - Striping data

Remote Copy Data Patterns and Partnership Tuning

Hello friends. I know it has been a while, but I would like to now put out part 2 to the 1920 post I had put out a while back. That post was largely around tuning the partnership to protect against impacts to the master volume (or replication source). This week I would like to explain a bit about how data is replicated to help understand why 1920 events and backpressure can happen and how to hopefully avoid the situation in the first place. Types of Remote Copy Relationships in SpecV Metro Mirror (Synchronous/Synchronous) Metro Mirror is what I call a Sync/Sync replication. By this I mean that the RPO is synchronous or in layman's terms the two sides of the mirror are always identical. Additionally, the data pattern for the replication is synchronous as well meaning we forward new writes as we receive them as shown here: When the primary host sends a write to the master volume, we will forward that to the remote cluster. The remote cluster will cache the write and send an a