TIL How to stop being notified about MacOS public beta updates

That was a nagging question for quite a while since I had left the public beta program and installed the final release of MacOS Sierra. Clicked the right buttons and chose the right options in “System Preferences” -> “App Store” to opt out from the beta updates but that obviously didn’t help since when Apple seeded 10.12.1 to its public beta testers I too was offered to update.
Thankfully, there is an easy way to fix the following using one-liner:

sudo softwareupdate --clear-catalog

No more betas till 10.13.

Posted on September 28, 2016 at 11:08 pm by sergeyt · Permalink · Leave a comment
In: Apple, TIL

Turned the page

After 5 exciting and tumultuous years in the enterprise IT as a Unix and SAN engineer it’s time to switch gears and taste something new. Of course, the alley I’m stepping into is not universally novel but for me personally it’s like an uncharted territory and something I could only dream about. The idea for a change had been ripening for quite a while so it was an effortless decision to say goodbye and move forward without turning back and holding no regrets.
Don’t want to paint with a broad brush but the enterprise IT is notoriously known for its conservatism, red-taping and being usually reluctant to any sort of changes. That’s ok for their business goals but, based solely on my personal experience, that could turn IT engineer into a bench sitter. Hopefully, at my new position I will be more exposed to the bleeding edge technologies, systems’ internals, programming and could become a better practitioner. Time will certainly tell but for now I’m emphatically waiting to face new challenges…

P.S.
Speaking of practitioners, if you haven’t seen/heard the latest Bryan Cantrill’s talk I highly encourage you to do so
A Wardrobe for the Emperor: Stitching Practical Bias into Systems Software Research

Posted on August 13, 2016 at 11:28 am by sergeyt · Permalink · 4 Comments
In: Life

TIL Common Name (CN) is legacy and subjectAltName must always be used.

Seems I’ve been living under a rock for far too log. From RFC2818:

Although the use of the Common Name is existing practice, it is deprecated and Certification Authorities are encouraged to use the dNSName instead.

So in today’s world CN is only evaluated when subjectAltName is not present and if it’s set all host names, IPs, emails, etc. must be specified in subjectAltName.

As a bonus, below is a one-liner to generate CSR with subjectAltName:

openssl req -new -newkey rsa:2048 -keyout example.com.key -sha256 -nodes -days 36500 -out example.com.csr -subj "/C=US/ST=IL/L=Chicago/O=Fortune500/OU=IT/CN=example.com" -reqexts v3_req -config <(cat /etc/pki/tls/openssl.cnf <(printf "[ v3_req ]\nsubjectAltName = DNS:example.com,DNS:www.example.com"))
Posted on July 6, 2016 at 2:52 pm by sergeyt · Permalink · Leave a comment
In: TIL

Calculating percentile in Python

If you need to find the percentile in Python do that correctly. Which means is by following one of the following receipts:

Posted on June 29, 2016 at 11:13 pm by sergeyt · Permalink · Leave a comment
In: TIL

Lecture on OpenZFS read and write code paths

If you are interested in ZFS this is the absolute must see video (which is actually a lecture) from Matt Ahrens (one of the two original ZFS creators):
Matt Ahrens – Lecture on OpenZFS read and write code paths

Posted on June 19, 2016 at 6:02 pm by sergeyt · Permalink · Leave a comment
In: ZFS

Pair “Listen queue overflow” FreeBSD errors with pcb

Just yesterday, after an upgrade to MySQL 5.7.12, saw plenty of errors were being logged in the system:

sonewconn: pcb 0xfffff8006311c870: Listen queue overflow: 151 already in queue awaiting acceptance (1 occurrences)
sonewconn: pcb 0xfffff8006311c870: Listen queue overflow: 151 already in queue awaiting acceptance (1 occurrences)
sonewconn: pcb 0xfffff8006311c870: Listen queue overflow: 151 already in queue awaiting acceptance (1 occurrences)
sonewconn: pcb 0xfffff8006311c870: Listen queue overflow: 151 already in queue awaiting acceptance (1 occurrences)

There is a great post that explains how to find the culprit. In a nutshell, there are two quick options:

  1. Use “lsof -itcp -stcp:listen -P” and grep for pcb.
  2. Or since “the overflow happens when the queue is at about 150% capacity” (as mentioned in the original post), it’s possible to match the number from the error (151 in my case) with an output from “netstat -an -p tcp -L”.

In my case that was trivial as both Postfix and Dovecot complained about missing libmysqlclient.so.18 shared library which was replaced with libmysqlclient.so.20 after the upgrade. Rebuilding from ports and restarting both of them fixed the issue and no hassling with kern.ipc.somaxconn was needed.

Posted on May 19, 2016 at 10:40 am by sergeyt · Permalink · Leave a comment
In: FreeBSD

Have stalled snmpd in recvfrom()? Check Recv-Q

Not so while ago I had an issue with a monitoring system that paged about SNMP checks failing on a number of servers. Quick checking here and there (logs, strace, tcpdump, etc.) revealed that snmpd had stalled in recvfrom() without sending a single packet out in response to the constant queries from our monitoring system. Everything seemed to be ok except “netstat -s” that showed a steady increase in “Udp: packet receive errors” counter. Summon ss to the rescue:

# ss -ianump \( sport = *:161 \)
State      Recv-Q Send-Q                                                                                       Local Address:Port                                                                                         Peer Address:Port
UNCONN     262680 0                                                                                                        *:161                                                                                                     *:*      users:(("snmpd",52984,7))

Matching 262680 with “sysctl net.core.rmem_default” suggested that the receiving buffers (Recv-Q) were filling up but why Taking a close look at the logs returned the following segfault:

cmanicd[55673]: segfault at 0 ip 00007f041e721081 sp 00007f040e16c700 error 4 in libnetsnmp.so.20.0.0[7f041e6a1000+a0000]

It turned out to be a well known issue with NIC Agent (CMANICD):
http://h20564.www2.hpe.com/hpsc/doc/public/display?docId=emr_na-c04912220&sp4ts.oid=316583

So it looked to be our guy. Starting cmanicd back immediately solved the problem:

[root@slon02db12 ~]# ss -ianump \( sport = *:161 \)
State      Recv-Q Send-Q                                                                                       Local Address:Port                                                                                         Peer Address:Port
UNCONN     0      0                                                                                                        *:161                                                                                                     *:*      users:(("snmpd",52984,7))

Recv-Q was dropped to zero and a server became green in the monitoring dashboard. Bingo. Problem solved so now it’s time for the upgrade.

Btw, If you don’t know how to read Linux segfault message (I didn’t know that myself before this issue) then the following note could fix that:

Nov 27 15:26:19 machine kernel: fmg[6335]: segfault at 00000000ffffd2dc rip 00000000ffffd2dc rsp 00000000ffffd1bc error 15

What does the kernel message mean, in detail?

  • The rip value is the instruction pointer register value, the rsp is the stack pointer register value.
  • The error value is a bit mask of page fault error code bits (from arch/x86/mm/fault.c):
  • Raw
     *   bit 0 ==    0: no page found       1: protection fault
     *   bit 1 ==    0: read access         1: write access
     *   bit 2 ==    0: kernel-mode access  1: user-mode access
     *   bit 3 ==                           1: use of reserved bit detected
     *   bit 4 ==                           1: fault was an instruction fetch
  • Here’s error bit definition:
  • Raw
    enum x86_pf_error_code {
      PF_PROT   =       1 << 0,
      PF_WRITE  =       1 << 1,
      PF_USER   =       1 << 2,
      PF_RSVD   =       1 << 3,
      PF_INSTR  =       1 << 4,
    };

In my case error code was 4 which means cmanicd tried to access address zero from the user space which reeks a NULL pointer dereference.

Posted on May 14, 2016 at 9:17 pm by sergeyt · Permalink · Leave a comment
In: Linux

Doing morning FreeBSD update

Applying FreeBSD patches is freaking easy. 

Posted on May 6, 2016 at 9:33 am by sergeyt · Permalink · Leave a comment
In: FreeBSD

Do initrd dance before turning Linux physical server into VM

If one day you decide to convert your physical server to a VM, which could be easily achieved if all its disks are presented from SAN, then don’t forget to rebuild initrd beforehand. Otherwise you would see something similar to this:

No device found
Scanning and configuring dmraid supported devices
Scanning logical volumes
  Reading all physical volumes. This may take a while...
  No volume groups found
Activating logical volumes
  Volume group "VolGroup00" not found
Trying to resume from /dev/VolGroup00/LogVol01
Unable to access resume device (/dev/VolGroup00/LogVol01)
Creating root device.
Mounting root filesystem.
mount: could not find filesystem '/dev/root'
Setting up other filesystems.
Setting up new root fs
setuproot: moving /dev failed: No such file or directory
no fstab.sys, mounting internal defaults
setuproot: error mounting /proc: No such file or directory
setuproot: error mounting /sys: No such file or directory
Switching to new root and running init
unmount old /dev
unmount old /proc
unmount old /sys
switchroot: mount failed: No such file or directory 
Kernel panic - not syncing: Attempted to kill init! 

Also, if your SAN disks are multipathed, which is an obvious and the only correct choice, then you must (according to RedHat note) to disable multipath by editing /etc/sysconfig/mkinitrd/multipath, otherwise the system won’t boot:

# vi /etc/sysconfig/mkinitrd/multipath MULTIPATH=no

Root Cause
The multipath option should only be set to YES if you your root volume (/) is on a multipathed device
If multipath is enabled with root (/) on a local device, multipathing will enable at boot time and lock down the device
If the device is locked down, fsck will be unable to open it for checking

There are two options to rebuild initrd:

  1. Use mkinitrd or dracut, depending on the OS version you’re currently on, and pre-build a new initrd before detaching the disks from the old system.
  2. If the system has been already converted to a VM, .i.e. all disks from the old system have been detached and presented as RDMs to a new VM, then boot from a rescue disk, and chroot to /mnt/sysimage (if you are running RedHat or CentOS) and run mkinitrd or dracut from their. Keep in mind that /boot partition as well as /sys must be mounted in the chrooted environment or, again, your system will not fly.
  3. mount --bind /proc /mnt/sysimage/proc
    mount --bind /dev /mnt/sysimage/dev
    mount --bind /sys /mnt/sysimage/sys

Good luck.

Posted on April 17, 2016 at 8:19 pm by sergeyt · Permalink · Leave a comment
In: Linux

Workaround for Tomcat7 on Linux, JDBC and javax.naming.NamingException

A few days ago I was dabbling with JDBC and Tomcat7 and the configuration that seemingly had no issues resulted in the following error in the log file:

org.apache.catalina.core.NamingContextListener addResource
WARNING: Failed to register in JMX: javax.naming.NamingException: Could not create resource factory instance
[Root exception is java.lang.ClassNotFoundException: org.apache.tomcat.dbcp.dbcp.BasicDataSourceFactory]

Thankfully, Google pointed me to this post at stackoverflow.com which had both the solution and the link to the details behind this behaviour.

In the nutshell, the workaround looks like the following:

  1. Grab tomcat-dbcp-version.jar from Maven that
    matches the version of Tomcat you are running and place it in $CATALINA_HOME/lib. Copying it somewhere else and creating a link also works.
  2. Update <Resource/> section in context.xml file by adding the following line:
    factory="org.apache.commons.dbcp.BasicDataSourceFactory"
  3. Restart Tomcat

Peace.

P.S. Did a quick test and it looks like that FreeBSD distributes tomcat-dbcp.jar as part of its tomcat package:

# pkg query %Fp tomcat7 | grep dbcp
/usr/local/apache-tomcat-7.0/lib/tomcat-dbcp.jar
Posted on February 19, 2016 at 2:53 pm by sergeyt · Permalink · Leave a comment
In: Linux