“WT_CONNECTION.open_session: only configured to support 20020”

Frankly speaking, the explanation provided in SERVER-30421 and SERVER-17364 is a bit vague and “hand wavy” to me but at least there are steps that could help mitigate it:

  1. Decrease idle cursor timeout (default value is 10 minutes):
    In mongodb.conf:

    setParameter:
      cursorTimeoutMillis: 30000
    

    Using mongo cli:

    use admin
    db.runCommand({setParameter:1, cursorTimeoutMillis: 30000})
    
  2. Increase session_max:
    storage:
      wiredTiger:
        engineConfig:
          configString: "session_max=40000"
    

Changing Oplog size or when root role is not enough

Managing MongoDB sometimes involves increasing Oplog size sine the default setting (5% of free disk space if running wiredTiger on a 64-bit platform) is not enough. And if you’re running MongoDB older than 3.6 that requires some manual intervention described in the documentation. It’s pretty straightforward even if it requires a node downtime as part of the rolling maintenance operation. But what is more important is that the paper glosses over the fact that to be able to create a new oplog just having “root role” is not enough.

> db.runCommand({ create: "oplog.rs", capped: true, size: (32 * 1024 * 1024 * 1024) })
{
	"ok" : 0,
	"errmsg" : "not authorized on local to execute command { create: \"oplog.rs\", capped: true, size: (32 * 1024 * 1024 * 1024) }",
	"code" : 13
}

Granting an additional “readWrite” role on “local” db fixes the problem:

db.grantRolesToUser("admin", [{role: "readWrite", db: "local"}])

As stated in SERVER-28449 that has been done intentionally:

This intentional and is due to a separation of privileges. The root role is a super-set of permissions affecting user data specifically, not system data, therefore the permissions must be explicitly granted to perform operations on local.

So, please, keep that in mind and don’t flip out =)

No video during the flight

Don’t know what version of Linux they were running but looks like one of the following code paths triggered the issue:

static int pca953x_read_regs(struct pca953x_chip *chip, int reg, u8 *val)
{
	int ret;

	ret = chip->read_regs(chip, reg, val);
	if (ret < 0) { dev_err(&chip->client->dev, "failed reading register\n");
		return ret;
	}

	return 0;
}
static int pca953x_read_single(struct pca953x_chip *chip, int reg, u32 *val,
				int off)
{
	int ret;
	int bank_shift = fls((chip->gpio_chip.ngpio - 1) / BANK_SZ);
	int offset = off / BANK_SZ;

	ret = i2c_smbus_read_byte_data(chip->client,
				(reg << bank_shift) + offset);
	*val = ret;

	if (ret < 0) { dev_err(&chip->client->dev, "failed reading register\n");
		return ret;
	}

	return 0;
}

MongoDB 3.4 or stay on 3.2?

If you’re herding multiple shards this one should be convincing enough to jump on 3.4 bandwagon:

mongos> sh.getBalancerHost()
getBalancerHost is deprecated starting version 3.4. The balancer is running on the config server primary host.

Moving to OmniOS Community Edition

Had a small snag when I tried to upgrade my old (r151018) OmniOS installation to OmniOS CE as described in the ANNOUNCEMENT OmniOS Community Edition – OmniOSce r151022h

During “pkg update” stage I got something similar to the following:

pkg update: The certificate which issued this certificate:/C=US/ST=Maryland/O=OmniTI/OU=OmniOS/CN=OmniOS r151018 Release
Signing Certificate/emailAddress=omnios-supp…@omniti.com could not be found.

Thankfully, the solution was a straightforward sequence of steps to upgrade to r151020, then to r151021 and finally to r151022.
From there I was able to successfully upgrade to OmniOS CE. Even “-r” option in “pkg update -rv” worked as a charm because this option doesn’t exist in r151018. Probably, I could skip r151021 all together, but it’s always better be safe than sorry.

How to reuse dropped sharded collection’s name

It happens that sometimes you want to drop your sharded collection and be able to reuse its name again. However, it might not be as straightforward as one expects it to be:

mongos>sh.shardColelction("your_database.your_collection", { "sharded_key": 1})

"code" : 13449,
"ok" : 0,
"errmsg" : "exception: collection your_database.your_collection already sharded"

The error message might be different but you get the idea – you can’t shared a collection if its name matches the one that has been recently dropped. Thankfully, there is a workaround described in SERVER-17397:

When dropping a collection:
use config
db.collections.remove( { _id: "DATABASE.COLLECTION" } )
db.chunks.remove( { ns: "DATABASE.COLLECTION" } )
db.locks.remove( { _id: "DATABASE.COLLECTION" } )
Connect to each mongos and run flushRouterConfig

Followed the steps in prod yesterday and it worked like a charm.

TIL Remove a Znode from Zookeeper

Yep, you could easily achieve that (and much more) using zkCli.sh (Zookeeper client):

$ /usr/share/zookeeper/bin/zkCli.sh 
Connecting to localhost:2181
Welcome to ZooKeeper!
JLine support is enabled

WATCHER::

WatchedEvent state:SyncConnected type:None path:null

[zk: localhost:2181(CONNECTED) 0] help
ZooKeeper -server host:port cmd args
	connect host:port
	get path [watch]
	ls path [watch]
	set path data [version]
	rmr path
	delquota [-n|-b] path
	quit 
	printwatches on|off
	create [-s] [-e] path data acl
	stat path [watch]
	close 
	ls2 path [watch]
	history 
	listquota path
	setAcl path acl
	getAcl path
	sync path
	redo cmdno
	addauth scheme auth
	delete path [version]
	setquota -n|-b val path

Issue “rmr” (to remove recursively) or “delete” to remove a znode.