Neat CLI Tricks on SpecV

Hello friends. I do apologize that it has been a while since I've shared anything new. It's been a long and strange trip getting through the last few weeks. To get things back going, I would like to share some interesting CLI tricks you can use to help troubleshoot and analyze your SpecV storage. These tricks should all be on/enabled in versions 7.7 onward - which I hope everyone reading is running given the supported code statements found here. With that said, lets start simple:

Running multiple commands in succession with a single input can be done using a ";" separator between commands. For example, the following one line will query the system time and then list the nodes visible by the node you are logged into.

IBM_Storwize:V7000B:superuser>svqueryclock; sainfo lsservicenodes
Mon Jun 24 11:33:32 EDT 2019
panel_name cluster_id       cluster_name node_id node_name relation node_status error_data
01-1       00000340A060C04E V7000B       4       node2     local    Active      724 2 3 1
01-2       00000340A060C04E V7000B       3       node1     partner  Active

Note: sainfo lsservicenodes is a command that can only be run as superuser.

For most commands we can specify a delimiter to help export output into spreadsheets or make it more readable in the CLI.

id name  UPS_serial_number WWNN             status IO_group_id IO_group_name config_node UPS_unique_id hardware iscsi_name                            iscsi_alias  panel_name enclosure_id canister_id enclosure_serial_number site_id site_name
3  node1                   5005076802000693 online 0           io_grp0       no                        100 AuxHostnode1 01-2       1            2           78G01RD                                  
4  node2                   5005076802000692 online 0           io_grp0       yes                       100 AuxHostnode2 01-1       1            1           78G01RD                                  
IBM_Storwize:V7000B:superuser>lsnodecanister -delim :

We can also filter output by piping command to grep.

IBM_Storwize:V7000B:superuser>lsvdisk -delim : | grep offline

We can even read text files using more or less commands. For this example, note that after specifying a directory hitting tab twice will list the contents of the directory - and specifying a full name for the more command will open the file for reading.

IBM_Storwize:V7000B:superuser>more /dumps/    <== Hit Tab twice instead of enter to get list
000000.trc                             configs/                               inst.180512.trc                        lost+found/                            snap.78G01RD-1.190610.191007.tgz
78G01RD-1.trc                          drive/                                 inst.180731.trc                        mdisk/                                 svc.config.cron.bak_78G01RD-1
78G01RD-1.trc.old                      easytier/                              inst.190213.trc                        nbrhood/                               svc.config.cron.bak_78G01RD-2
acpower.78G01RD-1.trc                  ec_makevpd.78G01RD-1.trc               inst.190214.trc                        reinst.78G01RD-1.trc                   svc.config.cron.log_78G01RD-1
acpower.78G01RD-1.trc.old              elogs/                                 iostats/                               rtc.racemqA_log.txt.78G01RD-1.trc      svc.config.cron.sh_78G01RD-1
audit/                                 enclosure/                             iotrace/                               rtc.racemqB_log.txt.78G01RD-1.trc      svc.config.cron.xml_78G01RD-1
boot.78G01RD-1.trc                     endd.trc                               keymgr.78G01RD-1.trc                   sel.78G01RD-1.trc                      svc.config.push.78G01RD-1.trc
cimom/                                 ethernet.78G01RD-1.stats               keymgr.kmip.78G01RD-1.trc              snap.78G01RD-1.190605.121652.tgz       syslogs/
cloud/                                 ethernet.78G01RD-1.trc                 livedump.78G01RD-1.190605.121439       snap.78G01RD-1.190605.123017.tgz       tejas/
cloudd.78G01RD-1.trc                   feature/                               livedump.ietd.78G01RD-1.190605.121439  snap.78G01RD-1.190605.133237.tgz      
IBM_Storwize:V7000B:superuser>more /dumps/78G01RD-1.trc  <== Specify path to file to read

!Continuing trace-------------------------------------------
SMI trace subsystem debugging report
Timestamp  : Fri Feb 22 07:55:39.505070 2019
Host       : newinstall
PID        : 7748
Parent PID : 5305

As you could guess, we can combine more and grep to get more direct output. For example use the following command to see all of the satask commands ran on the node or node canister and the return code from the CLI interface for those commands on May 13.

IBM_Storwize:V7000B:superuser>more /dumps/78G01RD-1.trc | grep "May 13" | grep SAT
(@ Mon May 13 15:49:14.882520 2019) SAT: satask cpfiles -prefix /dumps/snap.single.78G01RD-2.190513.154805.tgz -source 01-2
(@ Mon May 13 15:49:14.884468 2019) SAT: CMMVC8044E Command completed successfully.

Now that the easy tricks are over I will share my persona favorite - using while loops. For this there are two kinds I tend to use. The first is looping through an "ls" command's output to use it as input for another command. This includes combining general bash commands while and read as shown:

IBM_Storwize:V7000B:superuser>lsvdisk -nohdr -delim " " | grep offline | while read id trash; do echo $id; sleep 1; done

So with the above, we see the only offline volume is volume id 5. With this note that the default delimiter to the read command is a single space. So the volume id is read in as variable id and the remainder of the output is stored in variable trash. From this we used the echo command to print the value of id so we can see the offline volumes. We can pass this to more useful commands for example recovervdisk to try and bring it online instead like so:

IBM_Storwize:V7000B:superuser>lsvdisk -nohdr -delim " " | grep offline | while read id trash; do recovervdisk $id; sleep 1; done

The next kind of while loop I like using is an infinite loop. This will continue to run repeatedly until manually interrupted by closing your ssh session or entering a break character (usually ctrl+c). The most common example I use is to try and keep an array rebuild running when all hope seems lost - which is usually the case after a fire suppression event in the data center. Johnson Controls has a pretty good study on this you can read up on here, but for now lets check out the script.

IBM_Storwize:V7000B:superuser>while true; do svqueryclock; lseventlog | grep drive | while read sequ trash; do cheventlog -fix $sequ; done; lsiogrp | while read id trash; do chiogrp -maintenance yes $id; chiogrp -maintenance no $id; done; sleep 60; done

This script will automatically check for any drive events and fix them, then toggle maintenance mode on each iogrp (useful to clear internal error counters) every 60 seconds. This script doesn't guarantee getting a successful rebuild, but it is one way we can help nurse the rebuild along if it is struggling.

Of course all of these tricks can be used more than one way, so before I leave you to playing with your own shell, let me share a friendly reminder:

    #1) Think before you type.
    #2) With great power comes great responsibility.
I hope you all found this helpful and informative. If you have any questions or concerns please leave a comment, ask me on Twitter @fincherjc, or reach out to me on Linkedin.


Popular posts from this blog

Why you should always use DRAID

Remote Copy Data Patterns and Partnership Tuning

What is a 1920 and why is it happening?