Regression in pam_listfile module in RHEL5.9

Hot on the heels of the recent support ticket we opened with RedHat that has been finally resolved. For those who have RHN access below is the direct link: https://access.redhat.com/knowledge/solutions/328433

In a nutshell. After upgrading to RHEL 5.9 we lost the ability to ssh into any of our servers where pam_listfile was configured and the following error messages were registered in the logs instead:

Feb 27 13:51:41 host1 sshd[2649]: pam_listfile(sshd:account): Refused user abc for service sshd
Feb 27 13:51:41 host1 sshd[2649]: fatal: Access denied for user abc by PAM account configuration

pam_listfile configuration was configured as below:

account    required     pam_listfile.so onerr=fail item=group sense=allow file=/etc/security/groups.allow

Historically we had a duplicate local group in /etc/group with same id and name as ldap group and as we initially suspected that was the root cause:

In pam-0.99.6.2-6.el5_5.2, pam_listfile used the getgrent glibc function to fetch group information. This function call is non-selective, it fetches information about all groups from all NSS sources.
With the update to pam-0.99.6.2.12, pam_listfile uses the getgrnam_r glibc function call to fetch group information. This function call is selective, and it fetches information about only one group. Since it is a requirement that group names and group IDs should be unique across all identity sources, it stops once a single instance of the group name is found.

If you’re facing a similar behavior just double check for duplicate entries in the identity database.

OpenLDAP with TLS, ppolicy and master-master replication on RHEL6.3

This post has been dusting on a draft shelf for too long. No reason to keep it there any loger.
Below is the list of instructions which once followed would help anyone to end up with an OpenLDAP server on RHEL 6.1 (should work perfectly well with CentOS too). There is nothing special about it but if you’re looking for a tip on how to configure OpenLDAP, encrypt its database, setup PPolicy, use TLS and a two-way (master-master) replication then hope that you could take away something useful from this post.
That was an quick and short introduction.

The instructions listed below would assume that we have two LDAP servers:

LDAP1 – ldap1.example.com
LDAP2 – ldap2.example.com

And our test DN would be, what a surprise, dc=example,dc=com

Lets start from ldap1.example.com

  1. Install packages:
  2. yum -y install openldap-servers.x86_64 openldap-clients.x86_64
    
  3. Configure OpenLDAP:
  4. 2.1 Specify olcRootDN and olcRootPW for olcDatabase: {0}config and olcDatabase: {2}bdb databases:

    Use slappasswd to generate {SSHA} hashed passwords and add them into:

    vi /etc/openldap/slapd.d/cn\=config/olcDatabase\=\{0\}config.ldif:
    
    olcRootDN: cn=config
    olcRootPW: ##HASHED_PASSWORD##
    
    /etc/openldap/slapd.d/cn\=config/olcDatabase\=\{2\}bdb.ldif respectively:
    
    olcSuffix: dc=example,dc=com
    olcRootDN: cn=manager,dc=example,dc=com
    olcRootPW: ##HASHED_PASSWORD##
    
    vi /etc/openldap/slapd.d/cn\=config/olcDatabase\=\{1\}monitor.ldif
    olcAccess: {0}to *  by dn.base="gidNumber=0+uidNumber=0,cn=peercred,cn=externa
     l,cn=auth" read  by dn.base="cn=manager,dc=example,dc=com" read  by * none
     

    2.2 Specify Database configuration options and enable encryption:

    cat bdb.ldif
    
    dn: olcDatabase={2}bdb,cn=config
    changetype: modify
    replace: olcDbConfig
    olcDbConfig: set_cachesize 0 268435456 1
    olcDbConfig: set_lg_regionmax 262144
    olcDbConfig: set_lg_bsize 2097152
    olcDbConfig: set_flags DB_LOG_AUTOREMOVE
    -
    replace: olcDbCryptKey
    olcDbCryptKey: ##YOU_CRYPT_KEY##
    
    ldapmodify -h 127.0.0.1 -x -W -D “cn=config” -f ./bdb.ldif
    Enter LDAP Password:
    modifying entry "olcDatabase={2}bdb,cn=config"
    

    After that stop LDAP server, remove DB files and start LDAP server again:

    /etc/init.d/slapd stop
    rm -f /var/lib/ldap/*
    /etc/init.d/slapd start
    

    One LDAP server is restarted DB files will be recreated and encrypted.

  5. Global configuration Options
  6. 3.1 Generate CRS, signed it and copy to a directory of your choice. Please keep in mind that if you’re gong to configure a replication the paths and the names must be identical on both ldap nodes. Otherwise, when you bring a replication up it would overwrite your settings and TLS on one of the servers would be broken:

    cat ./config.ldif
    
    dn: cn=config
    changetype: modify
    delete: olcTLSCACertificatePath
    -
    replace: olcTLSCACertificateFile
    olcTLSCACertificateFile: ##PATH_TO_CA_FILE##
    -
    replace: olcTLSCertificateFile
    olcTLSCertificateFile: ##PATH_TO_CERTIFICATE_FILE##
    -
    replace: olcTLSCertificateKeyFile
    olcTLSCertificateKeyFile: ##PATH_TO_KEY_FILE##
    -
    replace: olcTLSCipherSuite
    olcTLSCipherSuite: HIGH:MEDIUM:!ADH:-SSLv2:+SSLv3
    -
    replace: olcSaslSecProps
    olcSaslSecProps: noanonymous,noplain,minssf=112
    -
    replace: olcDisallows
    olcDisallows: bind_anon
    -
    replace: olcIdleTimeout
    olcIdleTimeout: 120
    
    
    ldapmodify -h 127.0.0.1 -x -W -D “cn=config” -f ./config.ldif
    Enter LDAP Password:
    modifying entry "cn=config"
    

    Disable all LDAP access schemas in /etc/sysconfig/ldap and leave only LDAPS enabled and set the following option:

    SLAPD_OPTIONS="-g ldap"
    

    Time to restart LDAP server to make sure everything is still fine:

    /etc/init.d/slapd restart
    
  7. Global Database Options
  8. cat ./frontend.ldif
    
    dn: olcDatabase={-1}frontend,cn=config
    changetype: modify
    add: olcPasswordHash
    olcPasswordHash: {SSHA}
    -
    add: olcRequires
    olcRequires: LDAPv3 authc
    ldapmodify -H ldaps://ldap1.example.com -x -W -D “cn=config” -f ./frontend.ldif
    Enter LDAP Password:
    modifying entry "olcDatabase={-1}frontend,cn=config"
    
  9. Add your domain object (example.com in our case):
  10. cat ./example_com.ldif
    
    dn: dc=example,dc=com
    objectClass: dcObject
    objectClass: organization
    dc: example
    o: Example Com
    description: Example Com
    
    ldapadd -H ldaps://ldap1.example.com -x -W -D “cn=manager,dc=example,dc=com” -f ./example_com.ldif
    Enter LDAP Password:
    adding new entry "dc=example,dc=com"
    
  11. Enable PPolicy
  12. 6.1. Load policy module

    cat ./module.ldif
    dn: cn=module,cn=config
    objectClass: olcModuleList
    cn: module
    olcModuleLoad: ppolicy.la
    olcModulePath: /usr/lib64/openldap
    
    ldapadd -H ldaps://ldap1.example.com -x -W -D “cn=config” -f ./module.ldif
    Enter LDAP Password:
    adding new entry "cn=module,cn=config"
    

    Restart LDAP with /etc/init.d/slapd restart

    6.2 Add PPolicy Overlay:

    cat ./ppolicy-overlay.ldif
    dn: olcOverlay=ppolicy,olcDatabase={2}bdb,cn=config
    objectClass: olcPPolicyConfig
    olcOverlay: ppolicy
    olcPPolicyDefault: cn=ppolicy,ou=policies,dc=example,dc=com
    olcPPolicyUseLockout: TRUE
    olcPPolicyHashCleartext: TRUE
    
    ldapadd -H ldaps://ldap1.example.com -x -W -D "cn=config" -f ./ppolicy-overlay.ldif
    Enter LDAP Password:
    adding new entry "olcOverlay=ppolicy,olcDatabase={2}bdb,cn=config"
    

    6.3 Create default PPolicy and its rules:

    cat ./default_ppolicy.ldif
    dn: ou=policies,dc=example,dc=com
    objectClass: top
    objectClass: organizationalUnit
    ou: policies
    
    dn: cn=ppolicy,ou=policies,dc=example,dc=com
    objectClass: top
    objectClass: device
    objectClass: pwdPolicyChecker
    objectClass: pwdPolicy
    cn: ppolicy
    pwdAttribute: userPassword
    pwdInHistory: 8
    pwdMinLength: 8
    pwdMaxFailure: 3
    pwdFailureCountInterval: 1800
    pwdCheckQuality: 0
    pwdMustChange: TRUE
    pwdGraceAuthNLimit: 0
    pwdMaxAge: 7776000
    pwdExpireWarning: 1209600
    pwdLockoutDuration: 900
    pwdLockout: TRUE
    
    ldapadd -H ldaps://smsk01gw01 -x -W -D “cn=manager,dc=example,dc=com” -f ./default_ppolicy.ldif
    Enter LDAP Password:
    adding new entry "ou=policies,dc=example,dc=com"
    
    adding new entry "cn=ppolicy,ou=policies,dc=example,dc=com"
    
  13. Enable Audit overlay
  14. mkdir /var/log/slapd/
    chown ldap:ldap /var/log/slapd/
    
    echo “local4.*        /var/log/slapd/slapd.log” >> /etc/rsyslog.conf && /etc/init.d/rsyslog restart
    
    cat ./audit.ldif
    
    dn: cn=module{0},cn=config
    changetype: modify
    add: olcModuleLoad
    olcModuleLoad: {1}auditlog
    
    dn: olcOverlay=auditlog,olcDatabase={2}bdb,cn=config
    changetype: add
    objectClass: olcOverlayConfig
    objectClass: olcAuditLogConfig
    olcOverlay: auditlog
    olcAuditlogFile: /var/log/slapd/auditlog.log
    
    ldapadd -H ldaps://ldap1.example.com -x -W -D “cn=config” -f ./ppolicy.ldif
    
  15. Add group and people OUs:
  16. cat ./groups.ldif
    
    dn: ou=group,dc=example,dc=com
    objectClass: top
    objectclass: organizationalunit
    ou: group
    
    dn: cn=users,ou=group,dc=example,dc=com
    cn: users
    objectClass: posixGroup
    gidNumber: 10000
    
    dn: ou=people,dc=example,dc=com
    objectClass: top
    objectclass: organizationalunit
    ou: people
    
    
    ldapadd -H ldaps://ldap1.example.com -x -W -D “cn=manager,dc=example,dc=com” -f ./groups.ldif
    Enter LDAP Password:
    adding new entry "ou=group,dc=examlple,dc=com"
    
    adding new entry "cn=users,ou=group,dc=example,dc=com"
    
    adding new entry "ou=people,dc=example,dc=com"
    
  17. Add ACLs, proxy LDAP account, SSSD and PAM configuration:
  18. 9.1 Proxy LDAP account

    cat ./svc_ldp_proxy.ldif
    
    dn: cn=svc_ldp_proxy,dc=example,dc=com
    objectClass: simpleSecurityObject
    objectClass: organizationalRole
    cn: svc_ldp_proxy
    userPassword:
    
    ldapadd -H ldaps://ldap1.example.com -x -W -D “cn=manager,dc=example,dc=com” -f ./svc_ldp_proxy.ldif
    

    9.2 ACLs

    If you’re not interested in configuring replicatoion you could skip replicator.ldif part otherwise you should create a separate DN that will be used to bind to a remote LDAP server and to replicate the data. Strictly speaking that’s not required but it’s a good idea to have a dedicated account just to run the replication part.

    cat ./replicator.ldif
    dn: cn=replicator,dc=example,dc=com
    objectClass: simpleSecurityObject
    objectClass: organizationalRole
    cn: replicator
    userPassword: ##REPLICATOR_PASSWORD##
    
    ldapadd -H ldaps://ldap1.example.com -x -W -D "cn=manager,dc=example,dc=com" -f ./replicator.ldif
    Enter LDAP Password:
    adding new entry "cn=replicator,dc=example,dc=com"
    
    cat ./acl.ldif
    
    dn: olcDatabase={2}bdb,cn=config
    changetype: modify
    replace: olcAccess
    olcAccess: to attrs=userPassword by self write by dn="cn=replicator,dc=example,dc=com" write by anonymous auth
    olcAccess: to * by dn="cn=svc_ldp_proxy,dc=example,dc=com" read by dn="cn=replicator,dc=example,dc=com" write by self read by users read by anonymous auth by * none
    
    ldapmodify -H ldaps://ldap1.example.com -x -W -D “cn=config” -f ./acl.ldif
    Enter LDAP Password:
    modifying entry "olcDatabase={2}bdb,cn=config"
    

    9.3 SSSD

    cat /etc/sssd/sssd.conf
    [sssd]
    config_file_version = 2
    services = nss, pam
    domains = LDAP
    
    [nss]
    filter_groups = root
    filter_users = root
    
    [pam]
    pam_verbosity = 2
    
    [domain/LDAP]
    access_provider = simple
    simple_allow_groups = users
    tls_reqcert = never
    id_provider = ldap
    auth_provider = ldap
    chpass_provider = ldap
    use_fully_qualified_names = false
    ldap_uri = ldaps://ldap1.example.com:636/,ldaps://ldap2.example.com:636/
    ldap_search_base = dc=example,dc=com
    ldap_default_bind_dn = cn=svc_ldp_proxy,dc=example,dc=com
    ldap_default_authtok_type = password
    ldap_default_authtok = ##PASSWORD##
    ldap_tls_cacert = ##PATH_TO_CA_FILE##
    ldap_id_use_start_tls = true
    ldap_pwd_policy = none
    enumerate = false
    cache_credentials = false
    

    It’s a good idea to configure ldap.conf as well:

    cat /etc/openldap/ldap.conf
    
    URI ldaps://ldap1.example.com:636/ ldaps://ldap2.example.com:636/
    BASE dc=example,dc=com
    TLS_CACERT ##PATH_TO_CA_FILE##
    

    9.4 NSS and PAM (Only for RHEL6 or simialr distros). For RHEL5 the settings would be slightly different and I provided them in the very end.

    To be able to authenticate to the system and change a password using passwd command /etc/nsswitch.conf and /etc/pam.d/system-auth files must be updated respectively:

    Verify that /etc/pam.d/system-auth has the following line:

    password sufficient pam_sss.so use_authtok

    /etc/nsswitch.conf must have the following settings:

    passwd: files sss
    shadow: files sss
    group: files sss

  19. At this point your first LDAP server should be configured. But you must repeat all of the above steps to setup the second LDAP and only after that proceed further.
  20. LDAP Replication
  21. ldap1.example.com

    cat ./repl-module.ldif
    dn: cn=module,cn=config
    objectClass: olcModuleList
    cn: module
    olcModulePath: /usr/lib64/openldap
    olcModuleLoad: syncprov.la
    
    cat ./repl-config.ldif
    
    dn: cn=config
    changetype: modify
    replace: olcServerID
    olcServerID: 1 ldaps://ldap1.example.com
    olcServerID: 2 ldaps://ldap2.example.com
    
    dn: olcOverlay=syncprov,olcDatabase={0}config,cn=config
    changetype: add
    objectClass: olcOverlayConfig
    objectClass: olcSyncProvConfig
    olcOverlay: syncprov
    
    dn: olcDatabase={0}config,cn=config
    changetype: modify
    add: olcSyncRepl
    olcSyncRepl: rid=001 provider=ldaps://ldap1.example.com binddn="cn=config" bindmethod=simple credentials=##PASSWORD## searchbase="cn=config" type=refreshAndPersist retry="5 5 300 5" timeout=1
    olcSyncRepl: rid=002 provider=ldaps://ldap2.example.com binddn="cn=config" bindmethod=simple credentials=##PASSWORD## searchbase="cn=config" type=refreshAndPersist retry="5 5 300 5" timeout=1
    -
    add: olcMirrorMode
    olcMirrorMode: TRUE
    
    ldapmodify -H ldaps://ldap1.example.com/ -x -W -D “cn=config” -f ./repl-config.ldif
    Enter LDAP Password:
    modifying entry "cn=config"
    
    adding new entry "olcOverlay=syncprov,olcDatabase={0}config,cn=config"
    
    modifying entry "olcDatabase={0}config,cn=config"
    

    Add the above to ldap2.example.com too.

    ldapmodify -H ldaps://ldap2.example.com/ -x -W -D “cn=config” -f ./repl-config.ldif
    

    If everything has been done correctly the replication of config database should be up and running.

  22. Finally add actual data replication. This should be done on ldap1.example.com only since all the changes would be replicated to ldap2.example.com:
  23. cat ./bdb-repl.ldif 
    
    
    dn: olcDatabase={2}bdb,cn=config
    changetype: modify
    replace: olcLimits
    olcLimits: dn.exact="cn=manager,dc=example,dc=com" time.soft=unlimited time.hard=unlimited size.soft=unlimited size.hard=unlimited
    -
    add: olcSyncRepl
    olcSyncRepl: rid=004 provider=ldaps://ldap1.example.com binddn="cn=replicator,dc=example,dc=com" bindmethod=simple credentials=##PASSWORD## searchbase="dc=example,dc=com" type=refreshAndPersist retry="5 5 5 +" timeout=3
    olcSyncRepl: rid=005 provider=ldaps://ldap2.example.com binddn="cn=replicator,dc=example,dc=com" bindmethod=simple credentials=##PASSWORD## searchbase="dc=example,dc=com" type=refreshAndPersist retry="5 5 5 +" timeout=3
    -
    add: olcMirrorMode
    olcMirrorMode: TRUE
    
    dn: olcOverlay=syncprov,olcDatabase={2}bdb,cn=config
    changetype: add
    objectClass: olcOverlayConfig
    objectClass: olcSyncProvConfig
    olcOverlay: syncprov
    
    ldapmodify -H ldaps://ldap1.example.com -x -W -D “cn=config” -f ./bdb-repl.ldif
    Enter LDAP Password:
    modifying entry "olcDatabase={2}bdb,cn=config"
    
    adding new entry "olcOverlay=syncprov,olcDatabase={2}bdb,cn=config"
    

    As the last step update /etc/sysconfig/ldap and set all schemas to “no” leaving only SLAPD_URLS to look like a below:

    SLAPD_URLS=”ldaps://ldap1.example.com ldaps://ldap2.example.com”

  24. Restart LDAP servers on both servers and check the logs to make sure that binding was successful and the replication has been established.
    To be on the safe side, try to add a new DN in ou=people,dc=example,dc=com from ldap2.example.com, e.g. uid=testuser,ou=people,dc=example,dc=com, to make sure that this new record will be propagated to ldap1.example.com


RHEL5 client configuration section

  1. LDAP:
  2. cat /etc/ldap.conf
    
    base dc=example,dc=com
    uri ldaps://ldap1.example.com:636/ ldaps://ldap2.example.com:636/
    binddn cn=svc_ldp_proxy,dc=example,dc=com
    bindpw ##PASSWORD##
    scope sub
    timelimit 120
    bind_timelimit 120
    idle_timelimit 3600
    ldap_version 3
    
    nss_base_group          dc=example,dc=com
    nss_base_netgroup       dc=example,dc=com
    
    # Just assume that there are no supplemental groups for these named users
    nss_initgroups_ignoreusers root,ldap,named,avahi,haldaemon,dbus,radvd,tomcat,radiusd,news,mailman
    
    pam_password exop
    pam_lookup_policy yes
    
    tls_cacertfile ##PATH_TO_CA_FILE##
    
  3. PAM:
  4. cat /etc/pam.d/system-auth-ac
    #%PAM-1.0
    auth        required      pam_env.so
    auth        sufficient    pam_unix.so nullok try_first_pass
    auth        sufficient    pam_ldap.so use_first_pass
    auth        requisite     pam_succeed_if.so uid >= 500 quiet
    auth        required      pam_deny.so
    
    account     required      pam_unix.so
    account     required      pam_access.so
    account     sufficient    pam_succeed_if.so uid < 500 quiet
    account [default=bad success=ok user_unknown=ignore]    pam_ldap.so
    account     required      pam_permit.so
    
    password    required      pam_passwdqc.so min=disabled,disabled,disabled,8,8 retry=3 enforce=everyone
    password    sufficient    pam_unix.so sha512 shadow nullok try_first_pass use_authtok
    password    sufficient    pam_ldap.so use_authtok
    password    required      pam_deny.so
    
    session     optional      pam_keyinit.so revoke
    session     required      pam_limits.so
    session     [success=1 default=ignore] pam_succeed_if.so service in crond quiet use_uid
    session     sufficient    pam_ldap.so
    session     required      pam_unix.so
    session     required      pam_mkhomedir.so umask=0077
    
  5. NSS:
  6. Do not forget to configure /etc/nsswitch.conf appropriately depending on whether you are going to use ldap or compat.

Please, don't hesitate to let me know if you notice and mistakes or unfortunate typos that have cropped up.

OpenGrok, Ubuntu and Linux source tree

After all, thanks to our very long New Year holidays, I had a chance to install OpenGrok search and cross reference engine. The main reason is to have a tool to be able to efficiently browse Linux kernel source code. I use OpenGrok quite frequently, especially when there is a need to dive into Solaris/OpenSolaris internals. Frankly speaking, I haven’t seen anything else that comes close to it. And you know what? It’s a trivial game to install and run OpenGrok on your favorite Linux distro, BSD, Mac OS X or Solaris. But since I use Ubuntu as my cloud server all further details will be provided having it in mind. I will refer to the default installation since in 99.9% cases that would be the choice (I guess).

  1. First things first. Lets download OpenGrok and install additional packages if they’re missing.
  2. wget http://hub.opensolaris.org/bin/download/Project+opengrok/files/opengrok-0.11.1.tar.gz
    sudo apt-get install tomcat6 git exuberant-ctags
    
  3. By default, OpenGrok would use /var/opengrok so I chose to clone Linux source tree into /var/opengrok/src/linux-stable/ but you’re absolutely free to pick what suffice you most:
  4. sudo mkdir -p /var/opengrok/bin
    sudo mkdir /var/opengrok/lib
    sudo mkdir /var/opengrok/src && cd /var/opengrok/src
    sudo git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-stable
    
  5. Now it’s time to untar the archive and deploy OpenGrok.
  6. tar -zxf ./opengrok-0.11.1.tar.gz
    sudo cp ./opengrok-0.11.1/lib/opengrok.jar /var/opengrok/lib/
    sudo cp ./opengrok-0.11.1/bin/OpenGrok /var/opengrok/bin/    
    sudo /var/opengrok/bin/OpenGrok deploy
    
  7. Now you should have source.war copied to your webapps directory (in my case that was /var/lib/tomcat6/webapps/source.war).
  8. Start Tomcat if it’s not and run the indexer.
  9. /etc/init.d/tomcat6 start
    sudo /var/opengrok/bin/OpenGrok index /opt/opengrok/src
    Loading the default instance configuration ...
    WARNING: OpenGrok generated data path /var/opengrok/data doesn't exist
      Attempting to create generated data directory ... 
    WARNING: OpenGrok generated etc path /var/opengrok/etc  doesn't exist
      Attempting to create generated etc directory ... 
      Creating default /var/opengrok/logging.properties ...
    

There is however a small catch. The indexer is started with -Xmx option set to 2048m and this is done on purpose. Since I only had 512MB of RAM I tried to reduce Xmx down to 128, 256 and 512 megabytes but all the time I got “java.lang.OutOfMemoryError: Java heap space” error. So in the end I temporarily increased the sizing of my cloud server to 4GB and only after that the indexer had finished successfully.
I’m happy with the overall result because even with 512MB OpenGrok works blazingly fast – come in and check. I’ll do my best to update it whenever there is a new Linux release. Since OpenGrok supports incremental updates that shouldn’t be a problem:

cd /var/opengrok/src/linux-stable && git pull
sudo /var/opengrok/bin/OpenGrok update

Spacewalk, osad and jabberd miscommunication

Spacewalk is really a handy tool if you want to keep your Linux infrastructure up to date. Especially if you run Redhat based ditro, since essentially it’s community supported version of Redhat’s product called Satellite.
But it has a nasty issue, at least in our case, when the clients stop responding to the commands from a server. There are two mechanisms to deliver an action from a server to a client: through rhnsd, which by defaults connects to a server every 4 hours and checks if there any action it should execute, or using jabber protocol. In the later case a client receives an action request, i.e. install the latest packages or execute a command, from a server almost instantaneously but as I mentioned before, this cool feature stops working for no obvious reason. Everything seems to be working just fine: jabberd and osa-dispatcher are up and running, all client connects to a server flawlessly but an action request never reaches the target just like it has never been sent or got lost in between. Anyway, it seems that the only way out from this annoying situation is the following:

/etc/init.d/jabberd stop
/etc/init.d/osa-dispatcher stop
rm -f /var/lib/jabberd/db/*
su - postgresq && psql
delete from rhnPushDispatcher;
delete from rhnpushclient;

This is how we have to fix it from time to time. And don’t forget to restart osad daemon on all of your clients to reinstantiate a connection to your spacewalk server. If you have more than 10 servers this part could be a huge PITA. Hope that 1.8 release lacks this problem.

DLM lock levels in OCFS2

Yesterday I had to get a bit deeper into OCFS2 details and come across this very helpful blog post about different DLM lock levels used in OCFS2. Gave me a chance to place all the ducks in a row. 

How to quickly fix osad failure to connect to the SpaceWalk server

If one day you noticed that you spacewalk client had started to spit at you with the following errors – just know that it’s very easy to fix.

2012/03/15 17:17:43 +04:00 29835 0.0.0.0: osad/jabber_lib.setup_connection(‘Connected to jabber server’, ‘spacewalk-server-name’)
2012/03/15 17:17:43 +04:00 29835 0.0.0.0: osad/jabber_lib.register(‘ERROR’, ‘Invalid password’)

  1. Stop spacewalk
  2. rhn-satellite stop
    rm -f /var/lib/jabberd/db/*
    
  3. Connect to Postgresql and run this trivial sql
  4. delete from rhnPushDispatcher;
    
  5. Back to the cli and start Spacewalk processes back
  6. rhn-satellite start
    

NFSD panics on RHEL 5.8

If you as unlucky as I am and your RHEL 5.8 server has just spat the same call trace as you see on the picture I attached, then I’m here to make your problem less painful. If you have RHN account you could find a thorough explanation and the root case here and here

If you don’t have any than you would find the answer below:

Root Cause

The rq_pages array has 1MB/PAGE_SIZE+2 elements. The loop in svc_recv attempts to allocate sv_bufsz/PAGE_SIZE+2 pages. But the NFS server is setting sv_bufsiz to over a megabyte, with the result that svc_recv may attempt to allocate sv_bufsz/PAGE_SIZE+3 pages and run past the end of the array, overwriting rq_respages.

Resolution

echo 524288 >/proc/fs/nfsd/max_block_size

Note this has to be done after mounting /proc/fs/nfsd, but before starting nfsd. It is recommended this change be made via modprobe.conf.dist as follows:

# grep max_block_size /etc/modprobe.d/modprobe.conf.dist
install nfsd /sbin/modprobe --first-time --ignore-install nfsd && { /bin/mount -t nfsd nfsd /proc/fs/nfsd > /dev/null 2>&1 || :; echo 524288 > /proc/fs/nfsd/max_block_size; }

How to remove the last dead path in ESX

Feb  9 10:56:27 esx vmkernel: 47:18:55:03.308 cpu8:4225)WARNING: NMP: nmp_DeviceRetryCommand: Device "naa.60060e80056e030000006e030000004b": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.
Feb  9 10:56:27 esx vmkernel: 47:18:55:03.308 cpu8:4225)WARNING: NMP: nmp_DeviceStartLoop: NMP Device "naa.60060e80056e030000006e030000004b" is blocked. Not starting I/O from device.
Feb  9 10:56:28 esx vmkernel: 47:18:55:04.323 cpu8:4264)WARNING: NMP: nmpDeviceAttemptFailover: Retry world failover device "naa.60060e80056e030000006e030000004b" - issuing command 0x41027f458540
Feb  9 10:56:28 esx vmkernel: 47:18:55:04.323 cpu8:4264)WARNING: NMP: nmpDeviceAttemptFailover: Retry world failover device "naa.60060e80056e030000006e030000004b" - failed to issue command due to Not found (APD), try again...

If these log lines are familiar and you’ve been desperately trying to remove the last FC path to your storage array, which is by a awry coincedence is also dead, then probably the following command could help you in dealing with the issue:

# esxcli corestorage claiming unclaim -A vmhba1 -C 0 -T 3 -L 33 -d naa.60060e80056e030000006e030000004b -t location
# esxcfg-scsidevs -o naa.60060e80056e030000006e030000004b

Just don’t copy/paste it boldly but instead use your C:T:L and naa.* values appropriately.

Worked for me like a charm.

Linux solution for ARP flux

If for some reason you have to configure multiple network interfaces in the same subnet, e.g. eth0 – 192.168.1.1/24, eth1 – 192.168.1.2/24, eth2 – 192.168.1.3/24, etc., on Linux box you’d certainly face a so called ARP flux issue.
To fix it (I assume you have kernel > 2.6.2) just set the following sysctl options as listed below:

net.ipv4.conf.eth0.arp_ignore = 1
net.ipv4.conf.eth0.arp_announce = 2

In my particular case that wasn’t a complete remedy as I had to amend an additional sysctl parameter:

net.ipv4.conf.ethX.rp_filter = 0

where ethX is your network adapter, e.g. eth0, eth1, eth3, etc.

Read more about arp_announce and arp_ignore options.

Hope that helps. Good luck.