Things to keep in mind when replacing a PS on SF6800/6900

What could be easier than swapping a power supply?! True, very true but not when you’re dealing with SF6800/6900. The problem rises from the fact that it’s quite tough to insert a PSU into its bay because of a tight fitting. And there is a chance to make a pause during the insertion that could lead to a momentary drop in the 56V power line inside the server because not all pins have been connected to the power centerplane. And as a result, say goodbye to all of your domains and be prepared to power cycle.
To avoid that an engineer should try to insert a power supply in one smooth and uninterruptible motion. Thankfully, this issue has ben resolved with power supply 300-1595-03 and with the release of power supply 300-1930-01. So if you’re dealing with one of them then you should be safe.

Refer to the following link for more details.

Oracle’s option that neglects DR on SUN

I’m in no way an Oracle expert and my opinion has no influence but still I’d like to point out on a ridiculously stupid behavior caused by a hidden Oracle’s parameter called _enable_NUMA_optimization. As I’ve been told by our DBAs, in Oracle 9 this option was set to FALSE but in Oracle they decided to change its default value to TRUE. Not a big deal. A metalink note pledges that setting this option to TRUE improves the performance by splitting SGA into multiple segments (ipcs -a confirms that) and as a result reduces the total number of remote cache misses on a large system. But in the real life it’s not as glaring as it sounds. Conversely, with this parameter set to TRUE don’t even think about adding/removing SBs/CMUs into your big and beefy Sun box.

Per Oracle development, adding/removing processor groups are not supported.  
Oracle does not support dynamic reconfiguration with instance up while running 
on Solaris 5.9 or 10 when NUMA is enabled.

Isn’t that sweat?!

P.S. If you have a Metalink account then check Doc ID: 761156.1 for more details.
P.P.S. Update. Oracle has published a new alert, Doc ID 759565.1, that discuss their current guidance on NUMA optimization. They now discourage setting hidden Oracle parameters in favor of applying a patch that disables Oracle NUMA optimizations by default on 10.2.04 and This update was taken from here.

Disable mpxio per vendor basis

If you ever wondered about disabling mpxio not only per port but per vendor basis then you’d be surprised to know how easy it’s really to fulfill. Just edit /kernel/drv/scsi_vhci.conf file appropriately:

device-type-scsi-options-list =
"VendorID1ProductID1", "disable-option",
"VendorID2ProductID2", "disable-option",
"VendorIDnProductIDn", "disable-option";
disable-option = 0x7000000;

So if you, just like me, want to disable mpxio for Hitachi 9570 just add the following few lines to /kernel/drv/scsi_vhci.conf:

device-type-scsi-options-list =
"HITACHI DF600F", "disable-option";

disable-option = 0x7000000;

Note. VendorID must be exactly 8 characters long, so if it’s not (Hiatchi is 7 characters in length) just add extra spaces.

vxdmpadm path activate weirdness

Just stumbled upon a strange behavior of vxdmpadm which requires further investigation.
The problem I’ve faced with during an attempt to set certain path “active” to loadbalance the data flow on HDS between its controllers. The built-in help clearly states that:

# vxdmpadm setattr help

vxdmpadm setattr path

pathtype can be either:
        preferred [priority=]

So I dully expected it to work as declared but instead I got the following error:

# vxdmpadm getsubpaths  ctlr=c2
c2t50060E800042A5F0d81s2 ENABLED(A) PRIMARY      hds9500-alua0_0051 HDS9500-ALUA hds9500-alua0   -
c2t50060E800042A5F3d81s2 ENABLED    SECONDARY    hds9500-alua0_0051 HDS9500-ALUA hds9500-alua0   -
c2t50060E800042A5F0d82s2 ENABLED(A) SECONDARY    hds9500-alua0_0052 HDS9500-ALUA hds9500-alua0   -
c2t50060E800042A5F3d82s2 ENABLED    PRIMARY      hds9500-alua0_0052 HDS9500-ALUA hds9500-alua0   -

# vxdmpadm getsubpaths  ctlr=c3
c3t50060E800042A5F1d81s2 ENABLED(A) PRIMARY      hds9500-alua0_0051 HDS9500-ALUA hds9500-alua0   -
c3t50060E800042A5F2d81s2 ENABLED    SECONDARY    hds9500-alua0_0051 HDS9500-ALUA hds9500-alua0   -
c3t50060E800042A5F1d82s2 ENABLED(A) SECONDARY    hds9500-alua0_0052 HDS9500-ALUA hds9500-alua0   -
c3t50060E800042A5F2d82s2 ENABLED    PRIMARY      hds9500-alua0_0052 HDS9500-ALUA hds9500-alua0   -

# vxdmpadm setattr path c2t50060E800042A5F3d82s2 pathtype=active
VxVM vxdmpadm ERROR V-5-1-10357  Invalid argument or attribute specified.

Looks like I’ll need to investigate deeper to find the culprit but as a workaround just disabled the second path to force a failover to another one I tried to make active.

As always RTFM rules and I must admin that my apprehension, that with option one could change the state listed in the second column to active, was completely wrong. In the man page it’s lucidly written that pathtype=active is used to change a standby path to active.

# vxdmpadm setattr path c2t50060E800042A5F3d81s2 pathtype=standby
# vxdmpadm getsubpaths 

c2t50060E800042A5F3d81s2 ENABLED    SECONDARY    hds9500-alua0_0051 hds9500-alua0 c2     STANDBY

# vxdmpadm setattr path c2t50060E800042A5F3d81s2 pathtype=active
# vxdmpadm getsubpaths 

c2t50060E800042A5F3d81s2 ENABLED    SECONDARY    hds9500-alua0_0051 hds9500-alua0 c2       -

Since in my case the path was already active it would be strange to make it active for the second time and as a result I got the error. So actually there were two options:

  • Use vxdmpadm disable
  • Use vxdmpadm setattr path pathtype=standby/active

So, folks, never underestimate the documentation. ;-)

Moving to my own domain

Just a quick update on what’s happening here ;-)
I’ve finally overcome my laziness and purchased a VPS, this is a subject for another post, quickly configured LAMP environment and installed WordPress so during the next few days or maybe even weeks, because currently I’m short on a free time, this site will be in constant appearance-related changing and polishing.
Have a nice day.

Data restoration from tape

Recently I had to restore some data from a tape written by means of Netbackup, so solely for the reference purposes I decided to write this short post.
First we need to mount the tape, I did this using robtest utility, and perfrom a robot’s inventory to make Netbackup aware about a new tape. Keep in mind that sometimes barcode visible on the tape itself could not much with what has been written on the tape during the backup. This discrepancy could be a result of different Netbackup’s barcode rules. To double check, use more e.g. more /dev/rmt/13cbn
After that I ran the following set of commands:

bpimport -create_db_info -id F006L1 -L /tmp/bpimport.log
bplist -C client's_name -l -t 4 -R /
bprestore -B -S source -D destination -C client's_name -t 0  \\
-L /tmp/bprestore.log -R /tmp/rename_file /what/to/restore

Identifying a broken disk in HP DL360

Since I don’t have great level of experience with HP DL series I was puzzled for a bit when I found out that one of the disk was broken. How I did that? Easy, by gazing at the “Faulty Led” that went steadily on. Not bad but I wanted to be able to grab more detailed information from the console. Since DL360 has built in “HP Smart Array” you won’t be able to squeeze much of the information from the system with ordinary tools i.e. fdisk because the system could see only a logical drive presented by the array controller.
The solution was on the surface – I forwarded my path to downloaded and installed hpacucli RPM and that was it. So now I could do everything I wanted:

# hpacucli ctrl all show

Smart Array 6i in Slot 0 (Embedded)     

hpacucli ctrl all show detail

Smart Array 6i in Slot 0 (Embedded)
   Bus Interface: PCI
   Slot: 0
   RAID 6 (ADG) Status: Disabled
   Controller Status: OK
   Chassis Slot: 
   Hardware Revision: Rev B
   Firmware Version: 2.36
   Rebuild Priority: Low
   Expand Priority: Low
   Surface Scan Delay: 15 secs
   Post Prompt Timeout: 0 secs
   Cache Board Present: True
   Cache Status: OK
   Accelerator Ratio: 100% Read / 0% Write
   Total Cache Size: 64 MB
   No-Battery Write Cache: Disabled
   Battery/Capacitor Count: 0
   SATA NCQ Supported: False

# hpacucli ctrl slot=0 logicaldrive all show 

Smart Array 6i in Slot 0 (Embedded)

   array A (Failed)

      logicaldrive 1 (136.7 GB, RAID 1, Interim Recovery Mode)

# hpacucli ctrl slot=0 physicaldrive all show 

Smart Array 6i in Slot 0 (Embedded)

   array A (Failed)

      physicaldrive 1:0   (port 1:id 0 , Parallel SCSI, ??? GB, Failed)
      physicaldrive 1:1   (port 1:id 1 , Parallel SCSI, 146.8 GB, Predictive Failure)


Missing voolboot file

If executing ” vxdctl enable” you receive the following error:

VxVM vxdctl ERROR V-5-1-1589 enable failed: Volboot file not loaded

then this sequence could help you to resolve the problem:

vxio set 10
vxconfigd -d
vxdctl init
vxdctl enable

Good luck.