LDAP Disaster

On Monday afternoon I updated various packages on our Fedora Core 5 server using yum. This has in the past caused one or two little tragedies. Really I should know better and do such updates over the weekend but of course I went ahead all gung-ho.

The vital mission critical thing that died this time was the OpenLDAP server which runs authentication across all the CETIS sites. No-one could get in to edit the wikis or blogs or a whole bunch of other services which is pretty disastrous really.

I scratched my brain for all of Tuesday and even a few hours on Monday night – trying to figure out what had happened. Basically it seemed that all the data in the openldap database had disappeared. I could connect to the server but it was unable to list the nodes of the directory. I tried a few command-line diagnostic tools. slapcat produced absolutely no output slapd_db_recover happily recovered something but made no difference whatsoever. Doing an ldapsearch (which should dump the whole dataset) did the following:

[root@arwen ldap]# ldapsearch -x
# extended LDIF
#
# LDAPv3
# base with scope subtree
# filter: (objectclass=*)
# requesting: ALL
#

# search result
search: 2
result: 32 No such object

I started off thinking that my config files were knackered – so I pawed over ldap.conf and slapd.conf for hours – and nothing changed. I did notice that there was an /etc/ldap.conf as well as an /etc/openldap/ldap.conf. I compared the two and removed the one loose in /etc as it seemed wrong. Didn’t help.

Next I got drawn down a big red-herring as I noticed messages in the logs when starting slapd:

Jun 13 12:01:32 arwen slapd[18004]: sql_select option missing
Jun 13 12:01:32 arwen slapd[18004]: auxpropfunc error no mechanism available
Jun 13 12:01:32 arwen slapd[18004]: auxpropfunc error invalid parameter supplied

Several sources claimed that this was to do with permission problems and SASL – but it turned out that it was completely unrelated to my actual problem and could be safely ignored. Again I wasted loads of time reading about SASL and chmodding files everywhere. I suppose it might become important were I ever to decide to actually use SASL with the directory.

So where had my data gone? This morning while on a conference call I was idly noodling through the database files in /var/lib/ldap and noticed a directory called rpmorig which I hadn’t really been through. I looked and I saw and I suddenly realised that there were a lot more .bdb files in there than there were in the parent directory and that they were full of data. The penny dropped. yum had kindly backed up all my data into this directory and replaced the working files with fresh empty ones. I moved the rpmorig directory into the place of /var/lib/ldap, restarted slapd and behold EVERYTHING WORKS AGAIN.

I curse whoever put together that yum package.

One thought on “LDAP Disaster

  1. Thanks for the excellent entry. I had nearly the exact thing happen last night, I went down all the same routes you did and found your entry as I was getting ready to copy the rpmorig directory back over the files in /var/lib/ldap. I was looking for someone that had done what I was about to do and found your entry, a near identical example of what I was going through. Once I finished reading your post, I was confident to copy the files over…I did so, and now everything works. Thanks for nothing to the morons that put that activity in the yum update, not my favorite method of update. Thanks again!