Bryan Cantrill in BSD Now podcast

The lastest episode of BSD Now (103) podcast brought in a fantastic and hilarious interview with Bryan Cantrill who is well known for his wit and right on the bullseye rants. It’s been awhile since I cried laughing so unquestionably this video is a highly recommended. Not to mention that his talk was very educational both from the technical (epoll, kqueue) and historical point of views. Bookmarked and added to the favorites.

When no documentation is better than a bad one.

I’ve just returned from Vladivistok where I spent a day replacing a battery in Sun’s SE 6120 disk array. What could be easier than that? True, unless you’ve been misguided by a broken documentation. Here is a quote from Sun/Oracle’s official document (Sun StorEdgeTM 6020 and 6120 Arrays System Manual):

Once a battery has been physically replaced in a given PCU and that PCU has been reinstalled in the tray, no further action is required. The system updates the battery FRU information as needed without operator intervention.

Piece of a cake – just swap a faulty battery and you’re good to go. Not really. When the battery was replaced “refresh -s” still complained that it was failed. “refresh -c” wasn’t a friend in that situation since if there is even a single faulty battery in a unit – the test would not start.

Just to be on a safe side I tried the second battery (all of them were original and new) and even a new PCU – but the end result was identical. Since I knew that the batteries were good I had to use special dot commands to fix that issue:

# sun
# password:
# .bat -c u1pcu2

Doing that I’ve just cleared the battery’s status so now “refresh -s” was reporting that it was “normal” and the battery started charging. As soon as it was completely charged

# .bat -i u1pcu2

was run to initialize battery warranty date and now it was time for “refresh -c” to place it under the test.

The end result – don’t blindly trust any documentation untill you’ve verified it through your experience.

P.S. I was told by the client that last time when they observed exectly the same behavior they simply turned of the array and all the dependent services.

My lovely SL500

My old friend SL500 has given me another gift just a few hours before my flight to Moscow from Irkutsk where I was replacing the robot module or Z-Drive assembly. So I rushed back to the customer’s site to find the following:

2010-11-12T22:38:21.483,      0.0.0.0.0, 510, robot, /usr/local/bin/Ifm, error, 0000, 5069, "Director - putResponse() servo mech reach event 5069 at 2009 tachs, 1849 mils"
2010-11-12T22:39:10.234,      0.0.0.0.0, 510,            robot, /usr/local/bin/Ifm,  error, 0000, 5069, "Director - putResponse() servo mech reach event 5069 at 2008 tachs, 1848 mils"
2010-11-12T22:39:10.291,      0.0.0.0.0, 3202,              ifm,                 ,  error, 3000, 3313, "(request id = HOST/0x101d5a48) IfmMove::doPut(): move back to source from (LMRC) 0,2,1,2 failed, going inop"
2010-11-12T22:39:10.644,      0.0.0.0.0, 3202,              ifm,                 ,  error, 3000, 3322, "(request id = HOST/0x101d5a48) IfmMove::doPut() to (LMRC) 0,2,1,2 : cartridge in hand, going inop"
2010-11-12T22:39:10.680,      0.0.0.0.0, 3202,              ifm,                 ,  error, 3000, 3322, "(request id = HOST/0x101d5a48) IfmMove::commonMoveCommand(): PUT request of tape AB0017L3 from (LMRC) 0,2,3,9 to (LMRC) 0,2,1,2 failed:"

The robot’s hand was frozen and standing still just opposite the slot it tried to load the tape back into. The result code 3313, which means “Put failed”, just seconds that. Since the time was running out the only solution I was able to come with was to manually remove the tape from the robot’s claws and reboot the library. Once I’m back to Moscow will have to call back the customer to find out the current state of the library. So this story will definitely have a sequel. Stay tuned…

Orphaned Dtrace, Fishworks and ZFS

First it was Bryan Cantrill and then Adam Leventhal who followed. After that the exodus had continued by Jeff Bonwick and Mike Shapiro both leaving Oracle. But today another big name from Sun Microsystems has closed the Oracle’s door – Brendan Gregg is leaving today and all we have been left with is a new Dtrace book from Brendan and Jim Mauro:

Enjoy the videos.

Update
Below is the list, taken from OpenSolaris mailing list, of all big names that have abandoned Sun/Oracle so far:

  • Ian Murdock (Emerging systems, i.e. new distro architecture)
  • Tim Bray (SGML/XML) (1 March 2010)
  • Simon Phipps (Open Source) (March 2010)
  • James Gosling (Java) (2 April 2010)
  • Sunay Tripathi (CrossBow) (April 2, 2010)
  • Garrett D’Amore (networking, audio, device drivers – formerly with General Dynamics (which had bought Tadpole)
  • Bryan Cantrill (DTrace) (July 2010)
  • Adam Leventhal (DTrace)
  • Jeff Bonwick (ZFS)
  • Michael W. Shapiro (dTrace, storage) (October 2010)
  • Brendan Gregg (dTrace, storage) (October 2010)

Who cares about TCO and ROI?!

As expected people care less about buisness acronyms and high words i.e. TCO, ROI, integrted stack, when real money are involved. I visioned confirmation of this during Oracle+Sun welcome event where all Oracles’/Suns’ consultants were touting about their integrated stack but felt short once asked about the price the customer will have to pay for the new support contract or the license fee for using Oracle on Sun hardware. The innovations and green technology are cool but everything grows dim in the face of a bill. To sweeten the pill, it has been said that once the integration process is completed the price-list for Sun hardware will probably be revised towards reduction.

What was really useful about visiting this event is a talk in the corridors which gave a hope that:

  • Oracle will actually use its privilege of owning the whole stack (from software to disks) to make Sun+Oracle platform more winning and more attractive than any other competitive solutions from performance perspective.
  • Next SPARC64 processors and M-series platform, which are planned to hit the market in 2012, will be 50/50, against todays 20/80, in terms of Sun/Fujitsu partnership. It’s well-known that contemporary SPARC64 CPUs are more Fujitsu brainchild than Sun’s.
  • We were offered to take Sun T5220 equipped with SSDs and Sun Flash F20 PCIe for testing. Very sweet.

Tender thanks for invitation

As I mentioned in my last post there is going to be a planed Oracle+Sun welcome event on the 20th of May in Moscow Marriott Hotel and, nevertheless, I was a bit skeptical about my chances to be allowed to attend this event I still received the confirmation today. Frankly speaking, I don’t expect to hear any breathtaking revelations or confessions, they are all well-known from the similar events that have already taken place earlier in other countries, but anyway I expect it to be it a cheerful moment in addition to have a personal touch to the historical event.

See you there…

D240 Disk’s story

Here is a story I’d like to share with your, my dear, reader. Couple days ago we finally found a remedy for a support case opened in Sun, which even a support engineer christened as “strange”. The problem was initially discovered whilst the root disk mirroring process was on-going. I noticed that it took flagrant amount of time to complete and once it was over I used dd and iostat tools to recreate the problem. And here what I found:

# dd if=/dev/rdsk/c10t0d0s2 of=/dev/null bs=128k &
# iostat -xnzM 5

                    extended device statistics              
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
   20.4    0.0    2.6    0.0  0.0  1.0    0.0   48.9   0 100 c10t0d0
                    extended device statistics              
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
   20.4    0.0    2.6    0.0  0.0  1.0    0.0   48.9   0 100 c10t0d0

Exactly the same output I got trying to write to the disk, so the issue was obvious – 2.6MB write/read was abysmally low even for D240 JBOD. Btw, our configuration was the following: SF6900 with Qlogic Ultra3 SCSI ISP10160 connected to D240. Whilst working on the problem we’ve tried to do many things:

  • Fiddle with system parameters, i.e.sd_max_throttle, and tried to switch from full to split bus configuration. Didn’t help.
  • Took another Qlogic Ultra3 SCSI ISP10160 – the problem was still there.
  • Connected D240 to a different HBA and couldn’t reproduce the original issue. So it looked like D240 wasn’t a perpetrator.
  • Plugged D240 back to the original HBA and tried to swap the slots in IO board – same issue.
  • Upgraded the disks’ firmware to the latest version (116370-15) – the problem persisted.

In the end the support engineer ordered a slightly different disk model than we originally had. Our Seagate (ST373307L) were replaced with ST373207L and the problem had finally disappeared.

                    extended device statistics              
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
  291.4    0.0   36.4    0.0  0.0  1.0    0.0    3.4   0  98 c10t0d0
                    extended device statistics              
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
  291.2    0.0   36.4    0.0  0.0  1.0    0.0    3.4   0  98 c10t0d0

Hazy miracle I hate. We still haven’t received any technical details but were promised to have the needed information as soon as it’s available from the PTS engineer situated in Sun European Competency Centre.

OpenSolaris, licensing, rumors and more

Right on the heels from the MOSUG meeting we had tonight I have something interesting to share with you. Mostly it relates to the most hot topics regarding the future of OpenSolaris, Solaris licensing and support. And so…

  1. As you’ve already heard or read in the mailing lists the latest OpenSolaris build has been frozen and eventually will become the so much anticipated next OpenSolaris release, presumably 2010.05
  2. As a developer or an ordinary user or whoever your are, you are welcome to download and use Solaris OS for free unless it’s not used for the production or any commercial purposes. There is a drawback, no patches available even security ones.
  3. There is no way to purchase a Solaris license only. Oracle doesn’t sell them. As of today, because tomorrow could bring something new, you could obtain Solaris license in different ways:
    • By purchasing the system as a whole from Oracle. In that case you’re going to receive a technical support from Oracle directly.
    • By buying the hardware and Solaris from third-party vendor, e.g. HP, IBM or DELL, but in that case the technical support would by provided by the same thrid-party vendors. That means, that if you buy a system with Solaris from HP then pester them in case of any issues.
  4. Tried to ask about recently spread Oracle-LSI rumors but didn’t receive a definite answer. Think that due to the Oracle’s policy so we will have to play a “wait and see” game.
  5. As far as I understood, Oracle is not going to make Oracle on Sun hardware cheeper than Oracle on HP iron and the bill will mostly be equivalent. But TCO and ROI would be more attractive in case of Sun/Oracle symbiosis.

Brooding about upcoming Oracle hardware service changes

Just read this on the opensolaris mailing list yesterday and if you don’t follow it then this information could be of big interest. From now on forget about different types of support contract for everything that we have got used to, i.e. Platinum, Gold, Silver or Bronze options, have been left behind and get prepared to fork off 12% of you net system’s price if you still thinking about getting support from Oracle/Sun.

Oracle Hardware Service Changes
Support Options
I. Systems
Premier Support for Systems
§ Covers system hardware, OS and virtualization software
§ One level of Service
§ 7/24 with 2 hour onsite response
§ Available within 25 miles of designated metro center
§ 12% of customer’s net system price
**Upon renewal, all current Sun Spectrum hardware and system support
customer’s will be migrated to the new offering receiving upgraded service
levels
II. OS and Systems Software
Premier Support for Operating System
§ Covers Oracle; Solaris, Enterprise, Linux and Oracle VM (OVM) running on Sun
Hardware
§ 8% of customer’s net system price
Premier Support for Software (Non OS)
§ 22% of customer’s net software license value
§ There will only be one price list – Hardware Price List
§ Service pricing is based on hardware price
III. Advanced Customer Services
Packaged Services
§ Installation
§ Professional Services
§ Premier Support Qualification (Recertification)
§ Data and Device Retention (Secure Disk)
Expert Services
§ On Site Resources
§ Custom PS
Operations Management
§ Managed Services
IV. Warranty Information Effective March 16th 2010
1 year from ship date
a. Phone coverage 5×9 Monday-Friday
b. Web coverage 24×7
**Users are required to register their warranty in order to log service requests.
c. Phone Response time
i. P1 – 4 hours
ii. P2 – 8 hours
iii. P3 – next business day
d. Parts Replacement:
i. Customer Replaceable unit: parts exchange only (CRU fee no
longer applicable)
ii. Field Replaceable unit: delivered by Oracle or authorized
partner
iii. Response SLA: 2 days
e. Firmware fixes provided
V. Renewal Guidelines
i. Upon renewal, all contracts will be migrated to a one year (12
month) Premier Support contract
VI. Service Portfolio Details:
http://www.oracle.com/us/support/systems/operating-systems/index.html
http://www.oracle.com/us/support/systems/premier/index.html
http://www.oracle.com/us/support/systems/advanced-customer-services/index.html