Multipath

1. Multipath Topology (read from multipath -l output)

The 4-number notation of paths separated by colons is host (i.e. HBA) number, channel (always 0 in our shop since we always use single channel HBAs), SCSI target which represents switch in our case, and LUN.

[root@dcdrpcora9 ~]# multipath -l
asm_vol2 (36005076801870036a000000000000d57) dm-3 IBM,2145
size=250G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| |- 3:0:0:0 sdc        8:32  active undef running
| `- 2:0:1:0 sde        8:64  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 2:0:0:0 sda        8:0   active undef running
  `- 3:0:1:0 sdg        8:96  active undef running
asm_vol1 (36005076801870036a000000000000d58) dm-2 IBM,2145
size=250G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| |- 2:0:0:1 sdb        8:16  active undef running
| `- 3:0:1:1 sdh        8:112 active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 3:0:0:1 sdd        8:48  active undef running
  `- 2:0:1:1 sdf        8:80  active undef running
The following diagram graphically represents the topology given by the multipath -l shown above. Red means HBA card (as in /sys/class/fc_host/host? or /sys/class/scsi_host/host?). Since we only use fibre channel HBAs to make device mapper multipaths, only HBAs 2 and 3 are shown; HBAs 0 and 1 are not fibre channel cards. Channel numbers are ignored; they are all 0. The two switches are shown in purple, each providing 4 paths going to the two storage LUNs in green, two paths coming from HBA2 and two from HBA3.
               HBA2 HBA3       <-- FC host or HBA
               /  \ /  \
              /    V    \
             /    / \    \
           [ SW0 ]   [ SW1 ]   <-- SCSI target or our switch
           /\   /\   /\   /\
          0  1 0  1 0  1 0  1  <-- LUN
        sda  b c  d e  f g  h  <-- path
         /    V    V    V    \
        /    / \  / \  / \    \
       /    /   \/   \/   \    \
      /    /    /\   /\    \    \
     /    /    /  \ /  \    \    \
     ------------- V -------------
     |   LUN0    |/ \|   LUN1    |   <-- LUN
     |  asm_vol2 |   |  asm_vol1 |
     -------------   -------------
    sda,sdc,sde,sdg  sdb,sdd,sdf,sdh <-- path
Take the first path in the multipath -l output for an example, path 3:0:0:0 sdc. It originates from HBA3, going through channel 0 (not shown in the diagram), switch 0, ending at LUN0, which is asm_vol2. Look at the first path for asm_vol1 in the output, 2:0:0:1 sdb. It starts at HBA2, goes to channel 0 (not shown), switch 0, and ends at LUN1, i.e. asm_vol1.

2. Script to check multipath failures

#!/usr/bin/perl -w
#ck_multipaths.pl: Check active multipaths, alert if less than 4 paths (Yong 2013,2014)
#assume mapper device named like ^asm; if not, adjust regexp pattern as needed

$RECIPIENT='you@example.com,yourbuddy@example.com';
$LOGFILE='/root/ck_multipaths.log';
$LOGFILEHIST='/root/ck_multipaths.hist'; #accumulated history

$HOSTNAME=qx(/bin/hostname -s);

@mps = split /\n/, qx(/sbin/multipath -l);

sub process_mp
{ print "$mp has $cnt active paths.\n"; #path count of last, not this, mp in the loop
  if ($mp=~/^asm/ and $cnt<4)
  { $TM=qx(/bin/date "+%Y%m%d %H:%M"); chomp $TM;
    open LOG, ">>$LOGFILE" or die "Can't open $LOGFILE for write: $!";
    print LOG "$TM: $mp has $cnt active paths!\n";
    close LOG;
  }
}

system "/bin/cat $LOGFILE >> $LOGFILEHIST";
truncate "$LOGFILE", 0;
foreach(@mps)
{ if (/dm-/) #mp (multipath) header line
  { &process_mp if (defined $mp and defined $cnt);
    $cnt = 0;
    $mp = $_; #to be used for next line read
  }
  else
  { $cnt++ if /\d+:\d+ +\[?active/; #line pattern: "...major:minor active..." or "... [active"
  }
}

#the "finally" block
&process_mp if (defined $mp and defined $cnt);

system "/bin/mail -s \"Alert from $HOSTNAME\" $RECIPIENT < $LOGFILE" if -s $LOGFILE;

Yong Huang 2013,2014


My comments on multipath.conf settings
path_grouping_policy: When it's set to multibus for active/active devices, all paths are in 1 group, just like a hard disk has only C partition, easier to manage.
getuid_callout: Manually run the script to make sure it fetches wwid correctly.
features: Make very sure not to set queue_if_no_path to 1 for Oracle RAC; either set it to 0 or don't set features.
path_checker: Setting it to tur is for active/passive only.
failback: Must be immediate for fast failover
rr_min_io: Smaller value (than default 1000) may be better for OLTP? Note it's not rr_min_io requests, but that multipled by the priority value of requests, that must be done before switching path.
no_path_retry: Must be set to fail for Oracle RAC, according to numerous Oracle and Red Hat articles. Make sure it's not overridden in the more specific section below, such as devices{}.


Our case
Sep 03 2014 at 04:51 PM -04:00
Our test shows that with no_path_retry set to fail, features commented out (no need to set it to "0 queue_if_no_path"), and a few other parameters probably not very relevant (polling_interval=10, path_selector="round-robin 0", path_checker=readsector0, rr_min_io=100), we no longer get "multipathd blocked for xxx seconds" message and the server stays up.


Another case

Server I/O wait is high (shown in %wa of top or %iowait of sar), Oracle frequently stalls, and /var/log/messages has lines like

May 21 13:35:27 myhost kernel: qla2xxx [0000:0b:00.0]-801c:2: Abort command issued nexus=2:0:5 --  1 2002.
May 21 13:35:27 myhost kernel: qla2xxx [0000:0b:00.0]-801c:2: Abort command issued nexus=2:1:2 --  1 2002.
The root cause is later found to be a bad Cisco core switch. But temporarily disabling the faulted paths in multipath devices is a workaround. The key is to identify the faulted path device. According to an HP article, the numbers after nexus indicate the SCSI target, which in our case, are the path devices highlighted below in the multipath -l output
# multipath -l
...
asm_vol5 (36005076801870036a0000000000010a8) dm-11 IBM,2145
size=250G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| |- 2:0:0:5 sdf        8:80   active undef unknown
| `- 3:0:0:5 sdr        65:16  active undef unknown
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 2:0:1:5 sdl        8:176  active undef unknown
  `- 3:0:1:5 sdx        65:112 active undef unknown
...
asm_vol3 (36005076801870036a000000000000ef2) dm-6 IBM,2145
size=250G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| |- 2:0:1:2 sdi        8:128  active undef unknown
| `- 3:0:1:2 sdu        65:64  active undef unknown
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 2:0:0:2 sdc        8:32   active undef unknown
  `- 3:0:0:2 sdo        8:224  active undef unknown
...
The second number 0 here (single channel) can be omitted when matching the nexus numbers. To stop the frequent I/O hang, we can delete the corresponding SD devices that are the path devices of the multipath devices.
# echo 1 > /sys/block/sdf/device/delete
# echo 1 > /sys/block/sdi/device/delete
After a while, the faulted path devices will be gone from the multipath devices and system I/O wait comes down from 40% to very low and Oracle runs normally.

It's important to find all faulted devices, with a command like

# grep nexus /var/log/messages* | awk '{print $11}' | sort | uniq -c | sort -n
   1601 nexus=2:0:1  <-- corresponds to 2:0:0:1 in `multipath -l' output
   1677 nexus=2:0:5
   1941 nexus=2:1:2
   2146 nexus=2:1:4
   2158 nexus=2:1:0
   2248 nexus=2:0:3  <-- the one with the most faults
Typically, each multipath device will have one path device failing.

2018-05


References
Multipath Configuration Defaults
Documentation
FAQ

After you made changes to multipath settings, reload the map (multipath -r) and the multipathd service (service multipathd reload), and check

# multipathd -k
multipathd> show config
defaults {
        verbosity 2
        polling_interval 10
        udev_dir "/dev"
        multipath_dir "/lib64/multipath"
        path_selector "round-robin 0"
        path_grouping_policy multibus
        getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
        prio alua
        features "0"
...
You can also use this one-line command to do it: echo "show config" | multipathd -k

Some very preliminary notes:

login as: oracle
myhost ~ $ cd /sys/class/fc_remote_ports
myhost fc_remote_ports $ sudo multipath -l > /tmp/multipath.out
[sudo] password for oracle:
myhost fc_remote_ports $ head /tmp/multipath.out #see what the output looks like
ASM_DATA37_CPB (36005076801870036a000000000000e69) dm-242 IBM,2145
size=250G features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=0 status=active
| |- 0:0:7:16 sdoo 129:320 active undef running
| `- 1:0:7:16 sduc 66:576  active undef running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 0:0:5:16 sdmg 69:384  active undef running
  `- 1:0:6:16 sdmb 69:304  active undef running
ASM_DATA22_CPB (36005076801870036a000000000000e5a) dm-155 IBM,2145
size=250G features='0' hwhandler='0' wp=rw
myhost fc_remote_ports $ grep -- '- [0-9]:0' /tmp/multipath.out | cut -c6-10 | sort | uniq -c #assume single digit host
     77 0:0:1 <-- this host-target combination is used 77 times to form LUNs
     77 0:0:2
      1 0:0:4 <-- this combination is only used once
     74 0:0:5
      1 0:0:6
     74 0:0:7
     77 1:0:0
     77 1:0:1
      1 1:0:4
      1 1:0:5
     74 1:0:6
     74 1:0:7
myhost fc_remote_ports $ ls
rport-0:0-0  rport-0:0-10  rport-0:0-2  rport-0:0-4  rport-0:0-9  rport-1:0-1   rport-1:0-11  rport-1:0-3  rport-1:0-8
rport-0:0-1  rport-0:0-11  rport-0:0-3  rport-0:0-8  rport-1:0-0  rport-1:0-10  rport-1:0-2   rport-1:0-4  rport-1:0-9
myhost fc_remote_ports $ ls rport-0:0-0
device        fast_io_fail_tmo  node_name  port_name   power  scsi_target_id  supported_classes
dev_loss_tmo  maxframe_size     port_id    port_state  roles  subsystem       uevent
myhost fc_remote_ports $ for i in */scsi_target_id; do echo -n "$i: "; cat $i; done
rport-0:0-0/scsi_target_id: -1 <-- not a real fibre channel target
rport-0:0-10/scsi_target_id: 6
rport-0:0-11/scsi_target_id: 7
rport-0:0-1/scsi_target_id: 0
rport-0:0-2/scsi_target_id: 1
rport-0:0-3/scsi_target_id: 2
rport-0:0-4/scsi_target_id: 3
rport-0:0-8/scsi_target_id: 4
rport-0:0-9/scsi_target_id: 5
rport-1:0-0/scsi_target_id: -1 <-- same here
rport-1:0-10/scsi_target_id: 6
rport-1:0-11/scsi_target_id: 7
rport-1:0-1/scsi_target_id: 0
rport-1:0-2/scsi_target_id: 1
rport-1:0-3/scsi_target_id: 2
rport-1:0-4/scsi_target_id: 3
rport-1:0-8/scsi_target_id: 4
rport-1:0-9/scsi_target_id: 5
myhost fc_remote_ports $ for i in */roles; do echo -n "$i: "; cat $i; done
rport-0:0-0/roles: Directory Server
rport-0:0-10/roles: FCP Target, FCP Initiator
rport-0:0-11/roles: FCP Target, FCP Initiator
rport-0:0-1/roles: FCP Target, FCP Initiator
rport-0:0-2/roles: FCP Target, FCP Initiator
rport-0:0-3/roles: FCP Target, FCP Initiator
rport-0:0-4/roles: FCP Target, FCP Initiator
rport-0:0-8/roles: FCP Target, FCP Initiator
rport-0:0-9/roles: FCP Target, FCP Initiator
rport-1:0-0/roles: Directory Server
rport-1:0-10/roles: FCP Target, FCP Initiator
rport-1:0-11/roles: FCP Target, FCP Initiator
rport-1:0-1/roles: FCP Target, FCP Initiator
rport-1:0-2/roles: FCP Target, FCP Initiator
rport-1:0-3/roles: FCP Target, FCP Initiator
rport-1:0-4/roles: FCP Target, FCP Initiator
rport-1:0-8/roles: FCP Target, FCP Initiator
rport-1:0-9/roles: FCP Target, FCP Initiator
myhost fc_remote_ports $ for i in */supported_classes; do echo -n "$i: "; cat $i; done
rport-0:0-0/supported_classes: unspecified
rport-0:0-10/supported_classes: Class 3
rport-0:0-11/supported_classes: Class 3
rport-0:0-1/supported_classes: Class 3
rport-0:0-2/supported_classes: Class 3
rport-0:0-3/supported_classes: Class 3
rport-0:0-4/supported_classes: Class 3
rport-0:0-8/supported_classes: Class 3
rport-0:0-9/supported_classes: Class 3
rport-1:0-0/supported_classes: unspecified
rport-1:0-10/supported_classes: Class 3
rport-1:0-11/supported_classes: Class 3
rport-1:0-1/supported_classes: Class 3
rport-1:0-2/supported_classes: Class 3
rport-1:0-3/supported_classes: Class 3
rport-1:0-4/supported_classes: Class 3
rport-1:0-8/supported_classes: Class 3
rport-1:0-9/supported_classes: Class 3
myhost fc_remote_ports $ grep tmo /etc/multipath.conf
    #fast_io_fail_tmo     5
myhost fc_remote_ports $ for i in */dev_loss_tmo; do echo -n "$i: "; cat $i; done #default 30 seconds
rport-0:0-0/dev_loss_tmo: 30
rport-0:0-10/dev_loss_tmo: 30
rport-0:0-11/dev_loss_tmo: 30
rport-0:0-1/dev_loss_tmo: 30
rport-0:0-2/dev_loss_tmo: 30
rport-0:0-3/dev_loss_tmo: 30
rport-0:0-4/dev_loss_tmo: 30
rport-0:0-8/dev_loss_tmo: 30
rport-0:0-9/dev_loss_tmo: 30
rport-1:0-0/dev_loss_tmo: 30
rport-1:0-10/dev_loss_tmo: 30
rport-1:0-11/dev_loss_tmo: 30
rport-1:0-1/dev_loss_tmo: 30
rport-1:0-2/dev_loss_tmo: 30
rport-1:0-3/dev_loss_tmo: 30
rport-1:0-4/dev_loss_tmo: 30
rport-1:0-8/dev_loss_tmo: 30
rport-1:0-9/dev_loss_tmo: 30
myhost fc_remote_ports $ for i in */fast_io_fail_tmo; do echo -n "$i: "; cat $i; done #default?
rport-0:0-0/fast_io_fail_tmo: off
rport-0:0-10/fast_io_fail_tmo: 5
rport-0:0-11/fast_io_fail_tmo: 5
rport-0:0-1/fast_io_fail_tmo: off
rport-0:0-2/fast_io_fail_tmo: 5
rport-0:0-3/fast_io_fail_tmo: 5
rport-0:0-4/fast_io_fail_tmo: off
rport-0:0-8/fast_io_fail_tmo: 5
rport-0:0-9/fast_io_fail_tmo: 5
rport-1:0-0/fast_io_fail_tmo: off
rport-1:0-10/fast_io_fail_tmo: 5
rport-1:0-11/fast_io_fail_tmo: 5
rport-1:0-1/fast_io_fail_tmo: 5
rport-1:0-2/fast_io_fail_tmo: 5
rport-1:0-3/fast_io_fail_tmo: off
rport-1:0-4/fast_io_fail_tmo: off
rport-1:0-8/fast_io_fail_tmo: 5
rport-1:0-9/fast_io_fail_tmo: 5


To my Computer Page