Happy new 2015 year

Howdy!
By now we’re all in the same new 2015 year and I want to wish everyone a peaceful, prosperous and fruitful year. May all your dreams come true.
HO HO HO

Posted on January 1, 2015 at 1:27 pm by sergeyt · Permalink · Leave a comment
In: Life

How Cisco Ironport picks an outgoing IP address if there are many on the same subnet

If your IronPort is configured with multiple IP addresses on the same subnet and you wonder which one is used then Cisco has an answer for you:

Q: Which is the default used IP address (AUTO) if there are multiple IP addresses on the same subnet?
A: If there are multiple IP addresses configured within the same subnet as the default gateway, the IP address with the lowest number based on a c-string search will be used.

More details with the example are available at Cisco Email Security Appliance section of www.cisco.com web site.

So before adding a new one make sure that the default interface has been defined either by a content/message filter, deliveryconfig or alt-src host action. Assume that altsrchost CLI command should be good too.

P.S. Tagged this post as FreeBSD only because Cisco IronPort is based on FreeBSD

Posted on December 18, 2014 at 9:33 pm by sergeyt · Permalink · Leave a comment
In: FreeBSD

ruBSD 2014 is coming

Just like in year 2013 Yandex will be hosting ruBSD 2014 (content is in Russian) event on the 13th of December. It’s funny that I learnt about it from BSD Now podcast which, btw, I highly recommend. In the last episode, apart from the already mentioned ruBSD 2014 conference, Allan Jude and Kris Moore also mentioned that videos from recent MeetBSD California 2014 have been recently published as well as from OpenZFS Developer Summit 2014.
But I digressed. Returning to ruBSD, the agenda looks very promising:

Registration is free but the number of seats is limited.

Posted on November 29, 2014 at 12:33 am by sergeyt · Permalink · Leave a comment
In: FreeBSD, Life

Jumping into another cloud

After almost 4 years of being a Rackspace user I’ve moved to a new home – Amazon AWS. The main reason that gave me a nudge was the issue I was hit by after upgrading to 10.1-Release. Temporary solution did work but it was too costly to consider it as permanent so I started to look around. The decision was quick as I knew where to look at – thanks to FreeBSD Journal Issue July/Aug 2014 in which Colin Percival, a well-known FreeBSD committer, describes how to provision FreeBSD on EC2. Besides, he also publishes FreeBSD AMIs which gives hope that we won’t be left in the dark.
Can’t tell anything specific about AWS, it definitely fits my unassuming needs, but what I really like is their aws cli tool and of course waiting for my very first billing invoice.

Cheers.

Posted on November 27, 2014 at 1:19 am by sergeyt · Permalink · Leave a comment
In: FreeBSD

FreeBSD 10.1-Release as domU (guest) VM

I’ve been running FreeBSD 10.0-Release for quite a while using Rackspace’s environment and yesterday decided to jump on 10.1-Release bandwagon using exactly the same steps which I described in on of my earlier posts. However it wasn’t successful as expected since I see constant freezes with the following errors being displayed in the console:

network_alloc_rx_buffers: m_cjlget failed
network_alloc_rx_buffers: m_cljget failed
network_alloc_rx_buffers: m_cljget failed
network_alloc_rx_buffers: m_cljget failed

Increasing kern.ipc.nmbclusters and kern.ipc.nmbjumbop as a possible solution mentioned on FreeBSD Xen mailing list made no difference.

From GitHub FreeBSD repository it seems the error is generated by the following code:

if ((m_new->m_flags & M_EXT) == 0) {
			printf("%s: m_cljget failed\n", __func__);
			m_freem(m_new);

Being a non kernel developer in any way that’s all I could tell so far. Time to dive deeper into the code.

Update
The easiest/dumbest solution was to upgrade my Rackspace instance by adding extra CPU/RAM power. Not the solution I would like to have but at least it’s now possible to update and compile packages from the ports collections.

Posted on November 16, 2014 at 2:28 pm by sergeyt · Permalink · 3 Comments
In: FreeBSD

ZFS RAIDZ stripe width explained

Frankly speaking, dynamic stripping part of ZFS was never clearly understood by me, until I came across the following article by Matt Ahrens – http://blog.delphix.com/matt/2014/06/06/zfs-stripe-width/. Really worth reading.

Posted on November 3, 2014 at 4:02 pm by sergeyt · Permalink · Leave a comment
In: ZFS

What the heck are all those terms – PV, HVM, HVM with PV drivers, PVHVM, PVH?

If you have to deal with Xen from time to time and not on a permanent basis and want to refresh the difference between all those Xen modes, I strongly encourage everyone to read the following two outstanding articles:

The Paravirtualization Spectrum, part 1: The Ends of the Spectrum
The Paravirtualization Spectrum, Part 2: From poles to a spectrum

Posted on August 22, 2014 at 12:37 pm by sergeyt · Permalink · One Comment
In: FreeBSD, Linux

Default Linux I/O multipathd configuration, SCSI timeout and Oracle RAC caveat

I’ve been recently involved in a project to migrate from old and rusty Cisco MDS 9222i to the new MDS 9506 SAN switches and during the first phase of the migration the primary node in a two-node Oracle RAC cluster lost access to its voting disks and went down. And that’s when only half paths to SAN storage was unreachable whilst the other half was absolutely ok and active.

Oracle support pointed out to the following errors:

WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 1.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.
WARNING: Waited 15 secs for write IO to PST disk 0 in group 2.

Metalink document 1581684.1 at support.oracle.com gives more thorough explanation:

Generally this kind messages comes in ASM alertlog file on below situations:

  • Too many delayed ASM PST heart beats on ASM disks in normal or high redundancy diskgroup,
    thus the ASM instance dismount the diskgroup.By default, it is 15 seconds.
  • the heart beat delays are sort of ignored for external redundancy diskgroup.
    ASM instance stop issuing more PST heart beat until it succeeds PST revalidation,
    but the heart beat delays do not dismount external redundancy diskgroup directly.

The ASM disk could go into unresponsiveness, normally in the following scenarios:

+ Some of the paths of the physical paths of the multipath device are offline or lost
+ During path ‘failover’ in a multipath set up
+ Server load, or any sort of storage/multipath/OS maintenance

One way to solve that is to set _asm_hbeatiowait on all the modes of Oracle RAC to a higher value (in seconds) but not higher that 200.

But before that it would be a good idea to take a look at multipathd’s configuration first.

# multipathd -k"show conf"

Since Oracle RAC in our case was backed up by EMC VMAX array the following device section is of the most interest:

device {
                vendor "EMC"
                product "SYMMETRIX"
                path_grouping_policy multibus
                getuid_callout "/sbin/scsi_id -g -u -ppre-spc3-83 -s /block/%n"
                path_selector "round-robin 0"
                path_checker tur
                features "0"
                hardware_handler "0"
                rr_weight uniform
                no_path_retry 6
                rr_min_io 1000
        }

And it might seem that no_path_retry was one part of the problem:

A numeric value for this attribute specifies the number of times the system should attempt to use a failed path before disabling queueing.

In essence, instead of failing over to the active paths I/O was queued. The negative effect of this option was multiplied by the presence of another option, this time in the default section, called polling_interval which by default is set to 5 seconds. Now you see that I/O was queued by polling_interval*no_path_retry which is 30 seconds in total.

One obvious solution was, as expected, to disable queueing on Oracle voting disks by setting no_path_retry = fail. This was certainly a low hanging fruit but there were more in the details since there are several layers where IO commands issued to a device could experience the timeout:

The following quote from Redhat’s engineer adds more detailed explanation:

Also, please note that the timeout set in “/sys/class/scsi_device/h:c:t:l/device/timeout” is the minimum amount of time that it will take for the scsi error handler to start when a device is not responding, and *NOT* the amount of time it will take for the device to return a SCSI error. For example if the I/O timeout set to 60s, that means there’s a worst case of 120s before the error handler would ever be able to run.

Since IO commands can be submitted to the device up until the first submitted command is timed out, and that may take 60s for first command to get timed out, we could summarize the worst case scenario for longest time required to return IO errors on a device as follows:

[1] Command submitted to the sub path of device, inherits 60s timeout from /sys.

[2] just before 60s is up, another command is submitted, also inheriting a 60s timeout.

[3] first command times out at 60s, error handler starts but must sleep until all other commands have completed or timed out. Since we had a command submitted just before this, we wait another 60s for it to timeout.

[4] Now we attempt to abort all timed out commands. Note that each abort also sends a Test Unit Ready (TUR SCSI command) to the device, which have a 10 second timeout, adding extra time to the total.

[5] depending on the result of the abort, we may also have to reset the device/bus/host. This would add an indeterminate amount of time to the process, including more Test Unit Ready (TUR SCSI command) at 10 seconds each.

[6] Now that we’ve aborted all commands and possibly reset the device/bus/host, we requeue the cancelled commands. This is where we wait (number of allowed attempts + 1 * timeout_per_command) = (5+1 * 60s) = 360s. (**Note: in above formula number of allowed attempts defaults to 5 for any IO commands issued through VFS layer, and “timeout_per_command” is the timeout value set in “/sys/class/scsi_device/h:c:t:l/device/timeout” file).

[7] As commands reach their “(number of allowed attempts + 1 * timeout_per_command)” timeout, they will be failed back up to the DM-Multipath or application layer with an error code. This is where you finally see SCSI errors, and if multipath software is involved, for a path failure.

So the basic idea is that it’s very hard to predict the exact time it would take to failover and it’s worth trying to fiddle with different timeout settings, i.e. already mentioned and fast_io_fail_tmo, dev_loss_tmo from multipath.conf, as well as to look at the problem from the application’s side and update _asm_hbeatiowait accordingly. The question remains, why Oracle decided to set this parameter to 15 sec by default?

Posted on June 7, 2014 at 4:46 pm by sergeyt · Permalink · One Comment
In: Linux, Oracle

Solaris 11.2 beta is available

Yesterday Oracle announced the availability of Solaris 11.2 beta with a bunch of sweet enhancements, e.g. Openstack, Solaris Kernel zones, Unified archives, Compliance check and reporting, Automation with puppet and more.
Find more by reading Solaris 11.2 Beta – What’s new
For those who is interested in a hands on experience Solaris11.2 beta is also available for download in different formats including Virtualbox VM template.
Now I know what I will be doing during the upcoming 4 days-long state holiday.

Posted on April 30, 2014 at 10:29 am by sergeyt · Permalink · Leave a comment
In: Oracle, Solaris

Like to fiddle with VMAX FAST VP options?

Don’t do that and here is why: VMAX FASTVP Best Practice Essentials

Posted on March 24, 2014 at 11:12 pm by sergeyt · Permalink · Leave a comment
In: EMC