Today I had a nasty looking problem with my mysql installation. At first I thought I might have to drop one or more databases and re-install. Fortunately, I didn’t actually have to do that in the end.
I first noticed a problem at around 15.45 today when I couldn’t collect my mail. My mail system on this VM uses postfix as the smtp component and dovecot for IMAPS/POP3S delivery. A quick check showed that the server was up and postfix and dovecot were both running. Nothing immediately obviously wrong. However, a check on the mail log showed multiple entries of the form:
dovecot: auth-worker(3078): Error: mysql(localhost): Connect failed to database (mail): Can’t connect to local MySQL server through socket ‘/var/run/mysqld/mysqld.sock’
An attempt to connect to trivia also resulted in a “database error” message. So on checking again I noticed that of course mysql wasn’t running. Unfortunately, all my attempts to restart it failed. Further investigation of my logs (in particular, an ominously large /var/log/error) showed that I had a corrupt InnoDB page. As soon as mysql reached this it barfed and fell over. Not good. I don’t like database errors because I’m no database expert. But at least that explained the mail error. Both components of my mail system rely on a running mysql server because I store users, passwords, aliases, domain details etc. in mysql databases.
The largest database on this VM is, of course, the blog. I keep regular backups of that so in extremis I could dump the old database and reload from a backup and loose only my daily stats, But I was reluctant to do that without first checking to see if a repair was possible. I then spent some long time reading man pages, my O’Reily MySQL book, and the on-line MySQL reference pages. The best clue I received was the message in the error log:
Nov 16 15:57:05 pipe mysqld: InnoDB: If the corrupt page is an index page
Nov 16 15:57:05 pipe mysqld: InnoDB: you can also try to fix the corruption
Nov 16 15:57:05 pipe mysqld: InnoDB: by dumping, dropping, and reimporting
Nov 16 15:57:05 pipe mysqld: InnoDB: the corrupt table. You can use CHECK
Nov 16 15:57:05 pipe mysqld: InnoDB: TABLE to scan your table for corruption.
Nov 16 15:57:05 pipe mysqld: InnoDB: See also http://dev.mysql.com/doc/refman/5.5/en/forcing-innodb-recovery.html
Nov 16 15:57:05 pipe mysqld: InnoDB: about forcing recovery.
Nov 16 15:57:05 pipe mysqld: InnoDB: Ending processing because of a corrupt database page.
That message appeared just before a load of stacktrace information which looked horribly meaningless to me but spoke of deep, unfathomable database wrongness. A read of the referenced mysql manual page didn’t initially reassure me over much either. It starts:
If there is database page corruption, you may want to dump your tables from the database with SELECT … INTO OUTFILE. Usually, most of the data obtained in this way is intact.
“Usually” eh? Hmmm.
As suggested in the documentation I added the line “innodb_force_recovery = 1” to the [mysqld] section of my /etc/mysql/mysql,cnf config file and restarted the server. I started with the lowest non-zero option in the hope that this would cause the least corruption if I had to dump the tables. This time, mysql came up when I started it, but the error log contained the following:
Nov 16 18:09:26 pipe mysqld: InnoDB: A new raw disk partition was initialized or
Nov 16 18:09:26 pipe mysqld: InnoDB: innodb_force_recovery is on: we do not allow
Nov 16 18:09:26 pipe mysqld: InnoDB: database modifications by the user. Shut down
Nov 16 18:09:26 pipe mysqld: InnoDB: mysqld and edit my.cnf so that newraw is replaced
Nov 16 18:09:26 pipe mysqld: InnoDB: with raw, and innodb_force_… is removed.
Now there is nothing in my my.cnf about “raw” or “newraw”, so I simply removed the “innodb_force_recovery” line and shutdown and restarted mysql. Mysql started up without error, and without my having to dump any database. And no, I have no idea why. I can only assume that the force_recovery option forced some database repair as well as the documented forcing of the InnoDB storage engine to start.
And I don’t yet know what caused the problem in the first place, but since my logs show that the VM went down and restarted at around 14.36 I conclude that a failure somewhere in the VM system occurred during a database write and that screwed things up. I’m waiting for a response to my support call.
So. If you too ever face a database corruption similar to mine, do not panic. And do not attempt unnecessarily drastic database drops and reloads until you have at least tried the “innodb_force_recovery” option in your my.cnf configuration.
It may of course just be magic. Most database incantations are as far as I am concerned.