No video during the flight

Don’t know what version of Linux they were running but looks like one of the following code paths triggered the issue:

static int pca953x_read_regs(struct pca953x_chip *chip, int reg, u8 *val)
{
	int ret;

	ret = chip->read_regs(chip, reg, val);
	if (ret < 0) { dev_err(&chip->client->dev, "failed reading register\n");
		return ret;
	}

	return 0;
}
static int pca953x_read_single(struct pca953x_chip *chip, int reg, u32 *val,
				int off)
{
	int ret;
	int bank_shift = fls((chip->gpio_chip.ngpio - 1) / BANK_SZ);
	int offset = off / BANK_SZ;

	ret = i2c_smbus_read_byte_data(chip->client,
				(reg << bank_shift) + offset);
	*val = ret;

	if (ret < 0) { dev_err(&chip->client->dev, "failed reading register\n");
		return ret;
	}

	return 0;
}

MongoDB 3.4 or stay on 3.2?

If you’re herding multiple shards this one should be convincing enough to jump on 3.4 bandwagon:

mongos> sh.getBalancerHost()
getBalancerHost is deprecated starting version 3.4. The balancer is running on the config server primary host.

Moving to OmniOS Community Edition

Had a small snag when I tried to upgrade my old (r151018) OmniOS installation to OmniOS CE as described in the ANNOUNCEMENT OmniOS Community Edition – OmniOSce r151022h

During “pkg update” stage I got something similar to the following:

pkg update: The certificate which issued this certificate:/C=US/ST=Maryland/O=OmniTI/OU=OmniOS/CN=OmniOS r151018 Release
Signing Certificate/emailAddress=omnios-supp…@omniti.com could not be found.

Thankfully, the solution was a straightforward sequence of steps to upgrade to r151020, then to r151021 and finally to r151022.
From there I was able to successfully upgrade to OmniOS CE. Even “-r” option in “pkg update -rv” worked as a charm because this option doesn’t exist in r151018. Probably, I could skip r151021 all together, but it’s always better be safe than sorry.

How to reuse dropped sharded collection’s name

It happens that sometimes you want to drop your sharded collection and be able to reuse its name again. However, it might not be as straightforward as one expects it to be:

mongos>sh.shardColelction("your_database.your_collection", { "sharded_key": 1})

"code" : 13449,
"ok" : 0,
"errmsg" : "exception: collection your_database.your_collection already sharded"

The error message might be different but you get the idea – you can’t shared a collection if its name matches the one that has been recently dropped. Thankfully, there is a workaround described in SERVER-17397:

When dropping a collection:
use config
db.collections.remove( { _id: "DATABASE.COLLECTION" } )
db.chunks.remove( { ns: "DATABASE.COLLECTION" } )
db.locks.remove( { _id: "DATABASE.COLLECTION" } )
Connect to each mongos and run flushRouterConfig

Followed the steps in prod yesterday and it worked like a charm.

TIL Remove a Znode from Zookeeper

Yep, you could easily achieve that (and much more) using zkCli.sh (Zookeeper client):

$ /usr/share/zookeeper/bin/zkCli.sh 
Connecting to localhost:2181
Welcome to ZooKeeper!
JLine support is enabled

WATCHER::

WatchedEvent state:SyncConnected type:None path:null

[zk: localhost:2181(CONNECTED) 0] help
ZooKeeper -server host:port cmd args
	connect host:port
	get path [watch]
	ls path [watch]
	set path data [version]
	rmr path
	delquota [-n|-b] path
	quit 
	printwatches on|off
	create [-s] [-e] path data acl
	stat path [watch]
	close 
	ls2 path [watch]
	history 
	listquota path
	setAcl path acl
	getAcl path
	sync path
	redo cmdno
	addauth scheme auth
	delete path [version]
	setquota -n|-b val path

Issue “rmr” (to remove recursively) or “delete” to remove a znode.

TIL HSTS requires a secure transport

Otherwise (quoting RFC6797):

If an HTTP response is received over insecure transport, the UA MUST ignore any present STS header field(s).

That means SSL certificate on your server must be valid, i.e. no errors or warnings when you open a page from a browser over https.

Restart your Mongos after maxConsecutiveFailedChecks

Take it literally.
If you configured your MongoDB config servers as a replica set and for some reason, say a network outage, Mongos server lost connection to all of them and is not able to reconnect during maxConsecutiveFailedChecks attempts then, surprise, it becomes useless. Even if the network is up and running again, Mongos will not reconnect to the config servers and you won’t be able to authenticate to your shard cluster until Mongos is restarted.

From https://api.mongodb.com/cplusplus/current/classmongo_1_1_replica_set_monitor.html

static int 	maxConsecutiveFailedChecks = 30
 	If a ReplicaSetMonitor has been refreshed more than this many times in a row without finding any live nodes claiming to be in the set, the ReplicaSetMonitorWatcher will stop periodic background refreshes of this set. 

And if you check the source code of 3.2.x (3.2.12 as of this writing) branch you will see the following (./src/mongo/client/replica_set_monitor.cpp):

if (_scan->foundAnyUpNodes) {
            _set->consecutiveFailedScans = 0;
        } else {
            _set->consecutiveFailedScans++;
            if (timeOutMonitoringReplicaSets) {
                warning() << "All nodes for set " << _set->name << " are down. "
                          << "This has happened for " << _set->consecutiveFailedScans
                          << " checks in a row. Polling will stop after "
                          << maxConsecutiveFailedChecks - _set->consecutiveFailedScans
                          << " more failed checks";
            }
        }

So once you go pass maxConsecutiveFailedChecks the replica set will become unusable:

bool SetState::isUsable() const {
    return consecutiveFailedScans < maxConsecutiveFailedChecks;
}

As far as I can't tell 3.4.x doesn't have maxConsecutiveFailedChecks and hopefully one will not have to intervene and restart Mongos manually.