On Monday afternoon I updated various packages on our Fedora Core 5 server using yum. This has in the past caused one or two little tragedies. Really I should know better and do such updates over the weekend but of course I went ahead all gung-ho.
The vital mission critical thing that died this time was the OpenLDAP server which runs authentication across all the CETIS sites. No-one could get in to edit the wikis or blogs or a whole bunch of other services which is pretty disastrous really.
I scratched my brain for all of Tuesday and even a few hours on Monday night – trying to figure out what had happened. Basically it seemed that all the data in the openldap database had disappeared. I could connect to the server but it was unable to list the nodes of the directory. I tried a few command-line diagnostic tools. slapcat produced absolutely no output slapd_db_recover happily recovered something but made no difference whatsoever. Doing an ldapsearch (which should dump the whole dataset) did the following:
[root@arwen ldap]# ldapsearch -x
# extended LDIF
#
# LDAPv3
# base with scope subtree
# filter: (objectclass=*)
# requesting: ALL
#
# search result
search: 2
result: 32 No such object
I started off thinking that my config files were knackered – so I pawed over ldap.conf and slapd.conf for hours – and nothing changed. I did notice that there was an /etc/ldap.conf as well as an /etc/openldap/ldap.conf. I compared the two and removed the one loose in /etc as it seemed wrong. Didn’t help.
Next I got drawn down a big red-herring as I noticed messages in the logs when starting slapd:
Jun 13 12:01:32 arwen slapd[18004]: sql_select option missing
Jun 13 12:01:32 arwen slapd[18004]: auxpropfunc error no mechanism available
Jun 13 12:01:32 arwen slapd[18004]: auxpropfunc error invalid parameter supplied
Several sources claimed that this was to do with permission problems and SASL – but it turned out that it was completely unrelated to my actual problem and could be safely ignored. Again I wasted loads of time reading about SASL and chmodding files everywhere. I suppose it might become important were I ever to decide to actually use SASL with the directory.
So where had my data gone? This morning while on a conference call I was idly noodling through the database files in /var/lib/ldap and noticed a directory called rpmorig which I hadn’t really been through. I looked and I saw and I suddenly realised that there were a lot more .bdb files in there than there were in the parent directory and that they were full of data. The penny dropped. yum had kindly backed up all my data into this directory and replaced the working files with fresh empty ones. I moved the rpmorig directory into the place of /var/lib/ldap, restarted slapd and behold EVERYTHING WORKS AGAIN.
I curse whoever put together that yum package.