When no documentation is better than a bad one.

I’ve just returned from Vladivistok where I spent a day replacing a battery in Sun’s SE 6120 disk array. What could be easier than that? True, unless you’ve been misguided by a broken documentation. Here is a quote from Sun/Oracle’s official document (Sun StorEdgeTM 6020 and 6120 Arrays System Manual):

Once a battery has been physically replaced in a given PCU and that PCU has been reinstalled in the tray, no further action is required. The system updates the battery FRU information as needed without operator intervention.

Piece of a cake – just swap a faulty battery and you’re good to go. Not really. When the battery was replaced “refresh -s” still complained that it was failed. “refresh -c” wasn’t a friend in that situation since if there is even a single faulty battery in a unit – the test would not start.

Just to be on a safe side I tried the second battery (all of them were original and new) and even a new PCU – but the end result was identical. Since I knew that the batteries were good I had to use special dot commands to fix that issue:

# sun
# password:
# .bat -c u1pcu2

Doing that I’ve just cleared the battery’s status so now “refresh -s” was reporting that it was “normal” and the battery started charging. As soon as it was completely charged

# .bat -i u1pcu2

was run to initialize battery warranty date and now it was time for “refresh -c” to place it under the test.

The end result – don’t blindly trust any documentation untill you’ve verified it through your experience.

P.S. I was told by the client that last time when they observed exectly the same behavior they simply turned of the array and all the dependent services.

Password and group caching

If you have a NIS client on HPUX and you’ve spent a few hours already trying to understand why on the earth “id” command keeps telling something like this:

bash-3.2# id sergeyt
Can't find user sergeyt

And you’ve already checked all the pieces where NIS could brake, i.e. nsswitch.conf, ypwhich -m, ypmatch user_name passwd, nsquery, then try to ‘/sbin/init.d/pwgr stop’ and see if that would make a difference. Linux has a similar service called nscd (name server cache daemon) that has caused me a lot of trouble similar to the one mentioned above because both nscd and pwgrd cache not only positive but a negative responses as well.
Edit /etc/rc.config.d/pwgr file on HPUX server if you prefer this service to be disabled even if the server is rebooted.