Expanding SB40c with hpacucli

One of the tasks that we had to deal with during the last trip was to expand SB40c blade storage by replacing 4x146GB SAS disks with 6x30GB SAS disks. Preferably, this operation should be done online causing our client a zero downtime. Since these 4x146GB had been configured into RAID10 the initial idea was quite simple and straightforward:

  1. Pull out any one of the disks from SB40c.
  2. Replace it with a new 300GB disk and wait till the LogicalDrive is reconstructed.
  3. Remove another disk from the array but this time it should be from the other mirror strip.
  4. Insert a new 300GB disk and wait till the reconstruction is over.
  5. Do exectly the same with the two old 146GB disks.
  6. Expand the array by growing the logical drive (we had only one in the array’s configuration).

Replacing the first disk worked as planned and as soon as a 300GB replacement was swapped in a green lid went on indicating that the reconstruction had begun. So far so good. But when we replaced the second disk our joy had diminished – this new disk was giving no signs of life (all lids were black) , the system stalled to the point when it was impossible to gracefully shut it down. So we pressed a reset and thankfully once the system rebooted the reconstruction process continued so the data were safe. The other two 146GB disks were replaced online without a single hiccup, so the rest of the expansion plan was very easy and all we had to do was to insert another 2x300GB disks into the box to make the total number of disks equal to six. After that we just grew the logical drive as show bellow:

=> ctrl all show config

Smart Array P400 in Slot 3                (sn: PAFGL0N9SWK2OA)

   array A (SAS, Unused Space: 584359 MB)

      logicaldrive 1 (273.4 GB, RAID 1+0, OK)

      physicaldrive 1I:1:5 (port 1I:box 1:bay 5, SAS, 300 GB, OK)
      physicaldrive 2I:1:1 (port 2I:box 1:bay 1, SAS, 300 GB, OK)
      physicaldrive 2I:1:2 (port 2I:box 1:bay 2, SAS, 300 GB, OK)
      physicaldrive 2I:1:3 (port 2I:box 1:bay 3, SAS, 300 GB, OK)

   unassigned

      physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SAS, 300 GB, OK)
      physicaldrive 2I:1:4 (port 2I:box 1:bay 4, SAS, 300 GB, OK)

=> ctrl slot=3 ld 1 add drives=allunassigned 
=> ctrl all show config

Smart Array P400 in Slot 3                (sn: PAFGL0N9SWK2OA)

   array A (SAS, Unused Space: 1156500 MB)

      logicaldrive 1 (273.4 GB, RAID 1+0, Transforming, 0% complete)

      physicaldrive 1I:1:5 (port 1I:box 1:bay 5, SAS, 300 GB, OK)
      physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SAS, 300 GB, OK)
      physicaldrive 2I:1:1 (port 2I:box 1:bay 1, SAS, 300 GB, OK)
      physicaldrive 2I:1:2 (port 2I:box 1:bay 2, SAS, 300 GB, OK)
      physicaldrive 2I:1:3 (port 2I:box 1:bay 3, SAS, 300 GB, OK)
      physicaldrive 2I:1:4 (port 2I:box 1:bay 4, SAS, 300 GB, OK)

The initial explanation of what could’ve been a root cause was that one shouldn’t leave hpacucli tool running whilst replacing a disk. Sounds plausible since it resembles a problem when someone deletes a file whilst the other process is writing into it. Or more correctly, when a process read/write over NFS and the server becomes unavailable.
Since we had another SB40c and the task was identical we had the second chance. This time we had double checked that no one was running hpacucli and began replacing the disks. Two disks were replaced flawlessly but the third one hit us exactly with the same problem and we had to do a hard reset once again.
It’s still misty what was the real culprit in the first place. Who knows, maybe it was a firmware issue but we didn’t have the third SB40c to check that theory. Anyway, I think that such badly behavior is unacceptable even if these arrays had the oldest firmware possible.
So if anyone knows how to avoid that in the future or point to the possible error from our side – shoot out. Your comments are truly welcome.

Posted on December 18, 2010 at 12:39 pm by sergeyt · Permalink
In: HP-UX

Leave a Reply