Esta mañana se produjo un fallo hardware en uno de los servidores de laboratorio con OVM Server 3.4. En este entorno, OVM Manager está desplegado dentro del propio clúster OVM, y la incidencia provocó su caída abrupta. Al arrancar la máquina de manera manual desde el servidor OVM con el comando xm create, el SO arrancó correctamente, pero no los componentes OVM. ¿El motivo? Corrupción de datos en la BD MySQL tras la caída.
180123 17:53:24 mysqld_safe Starting mysqld daemon with databases from /u01/app/oracle/mysql/data 2018-01-23 17:53:24 0 [Warning] 'THREAD_CONCURRENCY' is deprecated and will be removed in a future release. 2018-01-23 17:53:24 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details). 2018-01-23 17:53:24 0 [Note] /usr/sbin/mysqld (mysqld 5.6.32-enterprise-commercial-advanced) starting as process 3816 ... 2018-01-23 17:53:24 3816 [Note] Plugin 'FEDERATED' is disabled. 2018-01-23 17:53:24 3816 [Note] InnoDB: Using atomics to ref count buffer pool pages 2018-01-23 17:53:24 3816 [Note] InnoDB: The InnoDB memory heap is disabled 2018-01-23 17:53:24 3816 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2018-01-23 17:53:24 3816 [Note] InnoDB: Memory barrier is not used 2018-01-23 17:53:24 3816 [Note] InnoDB: Compressed tables use zlib 1.2.3 2018-01-23 17:53:24 3816 [Note] InnoDB: Using Linux native AIO 2018-01-23 17:53:24 3816 [Note] InnoDB: Using CPU crc32 instructions 2018-01-23 17:53:24 3816 [Note] InnoDB: Initializing buffer pool, size = 1.0G 2018-01-23 17:53:24 3816 [Note] InnoDB: Completed initialization of buffer pool 2018-01-23 17:53:24 3816 [Note] InnoDB: Highest supported file format is Barracuda. 2018-01-23 17:53:24 3816 [Note] InnoDB: The log sequence numbers 1484386085656 and 1484386085656 in ibdata files do not match the log sequence number 1511364890742 in the ib_logfiles! 2018-01-23 17:53:24 3816 [Note] InnoDB: Database was not shutdown normally! 2018-01-23 17:53:24 3816 [Note] InnoDB: Starting crash recovery. 2018-01-23 17:53:24 3816 [Note] InnoDB: Reading tablespace information from the .ibd files... 2018-01-23 17:53:26 3816 [Note] InnoDB: Restoring possible half-written data pages 2018-01-23 17:53:26 3816 [Note] InnoDB: from the doublewrite buffer... 2018-01-23 17:53:26 3816 [ERROR] InnoDB: We scanned the log up to 1511364890624. A checkpoint was at 1511364890742 and the maximum LSN on a database page was 0. It is possible that the database is now corrupt! 2018-01-23 17:53:26 7f451f3e8720 InnoDB: Error: page 5 log sequence number 1511364907354 InnoDB: is in the future! Current system log sequence number 1511364890742. InnoDB: Your database may be corrupt or you may have copied the InnoDB InnoDB: tablespace but not the InnoDB log files. See InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html InnoDB: for more information. (...) InnoDB: 3 transaction(s) which must be rolled back or cleaned up InnoDB: in total 0 row operations to undo InnoDB: Trx id counter is 1018590976 InnoDB: Cleaning up trx with id 1018590873 2018-01-23 17:53:27 7f451f3e8720 InnoDB: Assertion failure in thread 139934853662496 in file trx0trx.cc line 292 InnoDB: Failing assertion: trx->update_undo == NULL InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html InnoDB: about forcing recovery. 16:53:27 UTC - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. key_buffer_size=33554432 read_buffer_size=2097152 max_used_connections=0 max_threads=151 thread_count=0 connection_count=0 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 653196 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0 thread_stack 0x40000 /usr/sbin/mysqld(my_print_stacktrace+0x35)[0x918445] /usr/sbin/mysqld(handle_fatal_signal+0x494)[0x69bdd4] /lib64/libpthread.so.0(+0xf7e0)[0x7f451efc97e0] /lib64/libc.so.6(gsignal+0x35)[0x7f451da695e5] /lib64/libc.so.6(abort+0x175)[0x7f451da6adc5] /usr/sbin/mysqld[0xa48492] /usr/sbin/mysqld[0xa41758] /usr/sbin/mysqld[0x9b4488] /usr/sbin/mysqld[0xa2f975] /usr/sbin/mysqld[0x973e0d] /usr/sbin/mysqld(_Z24ha_initialize_handlertonP13st_plugin_int+0x48)[0x5df848] /usr/sbin/mysqld[0x725cb1] /usr/sbin/mysqld(_Z11plugin_initPiPPci+0xbb6)[0x729b16] /usr/sbin/mysqld[0x5d21c8] /usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x3e5)[0x5d6de5] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f451da55d1d] /usr/sbin/mysqld[0x5c83a5] The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. 180123 17:53:27 mysqld_safe mysqld from pid file /u01/app/oracle/mysql/data/mysqld.pid ended
Tiene mal aspecto, pero aunque la primera opción en la que pensamos es la de atacar a nivel MySQL, dado que el entorno es muy estático y no ha habido cambios de configuración en las últimas 24h, optamos por la vía rápida e integrada en OVM, la herramienta de restauración de MySQL de OVMM. Para ello, basta seguir la documentación bastante clara y explícita de Oracle que podemos ver aqui. En nuestro caso, solo queremos recuperar a nivel de BD, por lo que no necesitamos ejecutar el paso número 1.
OVMM genera diariamente y de manera automática, un backup de la BD MySQL en la ruta /u01/app/oracle/mysql/backup (OVM 3.4). Vemos el contenido de este directorio en nuestro caso:
[oracle@ovmmanager ~]$ ls -latr /u01/app/oracle/mysql/dbbackup/ total 2592 drwx------ 4 oracle oinstall 4096 Apr 1 2015 3.3.1_preUpgradeBackup-20150401_114910 drwx------ 4 oracle oinstall 4096 Mar 10 2016 3.3.2_preUpgradeBackup-20160310_145455 drwx------ 4 oracle oinstall 4096 Mar 28 2016 3.3.3_preUpgradeBackup-20160328_133313 drwx------ 4 oracle oinstall 4096 Dec 29 2016 3.4.1_preUpgradeBackup-20161229_195435 drwx------ 4 oracle oinstall 4096 Sep 10 19:24 AutoFullBackup-20170910_192222 drwx------ 4 oracle oinstall 4096 Sep 11 19:25 AutoFullBackup-20170911_192432 drwx------ 4 oracle oinstall 4096 Sep 12 19:27 AutoFullBackup-20170912_192620 drwx------ 4 oracle oinstall 4096 Sep 13 19:28 AutoFullBackup-20170913_192733 drwxr-xr-x 5 oracle dba 4096 Jan 5 12:18 .. drwx------ 4 oracle oinstall 4096 Jan 6 12:38 AutoFullBackup-20180106_123729 drwx------ 4 oracle oinstall 4096 Jan 7 12:39 AutoFullBackup-20180107_123822 drwx------ 4 oracle oinstall 4096 Jan 8 12:40 AutoFullBackup-20180108_123931 drwx------ 4 oracle oinstall 4096 Jan 9 12:41 AutoFullBackup-20180109_124037 drwx------ 4 oracle oinstall 4096 Jan 10 12:42 AutoFullBackup-20180110_124144 drwx------ 4 oracle oinstall 4096 Jan 11 16:52 AutoFullBackup-20180111_165134 drwx------ 4 oracle oinstall 4096 Jan 12 16:53 AutoFullBackup-20180112_165229 drwx------ 4 oracle oinstall 4096 Jan 13 16:54 AutoFullBackup-20180113_165327 -rw-r----- 1 oracle oinstall 251602 Jan 13 17:21 OVMModelExport-20180113_172134.xml drwx------ 4 oracle oinstall 4096 Jan 14 16:55 AutoFullBackup-20180114_165436 -rw-r----- 1 oracle oinstall 254527 Jan 14 17:21 OVMModelExport-20180114_172136.xml drwx------ 4 oracle oinstall 4096 Jan 15 16:56 AutoFullBackup-20180115_165545 -rw-r----- 1 oracle oinstall 257452 Jan 15 17:21 OVMModelExport-20180115_172138.xml drwx------ 4 oracle oinstall 4096 Jan 16 16:57 AutoFullBackup-20180116_165655 -rw-r----- 1 oracle oinstall 260363 Jan 16 17:21 OVMModelExport-20180116_172141.xml drwx------ 4 oracle oinstall 4096 Jan 17 16:59 AutoFullBackup-20180117_165804 -rw-r----- 1 oracle oinstall 263280 Jan 17 17:21 OVMModelExport-20180117_172144.xml drwx------ 4 oracle oinstall 4096 Jan 18 17:00 AutoFullBackup-20180118_165918 -rw-r----- 1 oracle oinstall 241287 Jan 18 17:21 OVMModelExport-20180118_172147.xml drwx------ 4 oracle oinstall 4096 Jan 19 17:01 AutoFullBackup-20180119_170019 -rw-r----- 1 oracle oinstall 244200 Jan 19 17:21 OVMModelExport-20180119_172149.xml drwx------ 4 oracle oinstall 4096 Jan 20 17:02 AutoFullBackup-20180120_170140 -rw-r----- 1 oracle oinstall 247125 Jan 20 17:21 OVMModelExport-20180120_172151.xml drwx------ 4 oracle oinstall 4096 Jan 21 17:03 AutoFullBackup-20180121_170254 -rw-r----- 1 oracle oinstall 250050 Jan 21 17:21 OVMModelExport-20180121_172154.xml drwx------ 4 oracle oinstall 4096 Jan 22 17:05 AutoFullBackup-20180122_170402 -rw-r----- 1 oracle oinstall 250050 Jan 22 17:21 OVMModelExport-20180122_172156.xml drwxr-xr-x 27 oracle dba 4096 Jan 22 17:21 .
La última copia, de ayer, es AutoFullBackup-20180122_170402. Con esta información, en primer lugar paramos todo OVMM (Weblogic OVM Server, Oracle VM Command Line Interface y MySQL database):
[root@ovmmanager ~]# service ovmm stop Stopping Oracle VM Manager [ OK ] [root@ovmmanager ~]# service ovmcli stop Stopping Oracle VM Manager CLI [ OK ] [root@ovmmanager ~]# service ovmm_mysql stop Shutting down OVMM MySQL.. SUCCESS!
Ahora, como usuario oracle, lanzamos el shell script RestoreDatabase.sh especificando el nombre del backup desde el que queremos hacer la recuperación.
[oracle@ovmmanager ~]$ /u01/app/oracle/ovm-manager-3/ovm_tools/bin/RestoreDatabase.sh AutoFullBackup-20180122_170402 Before the database can be restored, the following database directories/files must be deleted: appfw ibdata1 ib_logfile0 ib_logfile1 mysql ovs performance_schema Are you sure it is safe to delete these directories/files now? [y,n] y Deleting /u01/app/oracle/mysql/data/appfw Deleting /u01/app/oracle/mysql/data/ibdata1 Deleting /u01/app/oracle/mysql/data/ib_logfile0 Deleting /u01/app/oracle/mysql/data/ib_logfile1 Deleting /u01/app/oracle/mysql/data/mysql Deleting /u01/app/oracle/mysql/data/ovs Deleting /u01/app/oracle/mysql/data/performance_schema INFO: Expanding the backup image... INFO: Applying logs to the backup snapshot... INFO: Restoring the backup... INFO: Restoring OVM keystores and certificates INFO: Success - Done! INFO: Log of operations performed is available at: /u01/app/oracle/mysql/dbbackup/AutoFullBackup-20180122_170402/Restore.log IMPORTANT: As 'root', please start the OVM Manager database and application using: service ovmm_mysql start; service ovmm start; service ovmcli start [oracle@ovmmanager ~]$
Arrancamos de nuevo los servicios:
[root@ovmmanager ~]# service ovmm_mysql start; Starting OVMM MySQL...... SUCCESS! [root@ovmmanager ~]# service ovmm start Starting Oracle VM Manager [ OK ] [root@ovmmanager ~]# service ovmcli start Starting Oracle VM Manager CLI [ OK ]
Todo arranca OK, incluída la BD. ¡Hemos recuperado la normalidad!
180123 18:03:50 mysqld_safe Starting mysqld daemon with databases from /u01/app/oracle/mysql/data 2018-01-23 18:03:50 0 [Warning] 'THREAD_CONCURRENCY' is deprecated and will be removed in a future release. 2018-01-23 18:03:50 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details). 2018-01-23 18:03:50 0 [Note] /usr/sbin/mysqld (mysqld 5.6.32-enterprise-commercial-advanced) starting as process 4526 ... 2018-01-23 18:03:50 4526 [Note] Plugin 'FEDERATED' is disabled. 2018-01-23 18:03:51 4526 [Note] InnoDB: Using atomics to ref count buffer pool pages 2018-01-23 18:03:51 4526 [Note] InnoDB: The InnoDB memory heap is disabled 2018-01-23 18:03:51 4526 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2018-01-23 18:03:51 4526 [Note] InnoDB: Memory barrier is not used 2018-01-23 18:03:51 4526 [Note] InnoDB: Compressed tables use zlib 1.2.3 2018-01-23 18:03:51 4526 [Note] InnoDB: Using Linux native AIO 2018-01-23 18:03:51 4526 [Note] InnoDB: Using CPU crc32 instructions 2018-01-23 18:03:51 4526 [Note] InnoDB: Initializing buffer pool, size = 1.0G 2018-01-23 18:03:51 4526 [Note] InnoDB: Completed initialization of buffer pool 2018-01-23 18:03:51 4526 [Note] InnoDB: Highest supported file format is Barracuda. 2018-01-23 18:03:51 4526 [Note] InnoDB: The log file was created by mysqlbackup --apply-log at ibbackup 180123 17:59:17. The following crash recovery is part of a normal restore. 2018-01-23 18:03:51 4526 [Note] InnoDB: The log sequence numbers 1484386085656 and 1484386085656 in ibdata files do not match the log sequence number 1511045530636 in the ib_logfiles! 2018-01-23 18:03:51 4526 [Note] InnoDB: Database was not shutdown normally! 2018-01-23 18:03:51 4526 [Note] InnoDB: Starting crash recovery. 2018-01-23 18:03:51 4526 [Note] InnoDB: Reading tablespace information from the .ibd files... 2018-01-23 18:03:54 4526 [Note] InnoDB: Restoring possible half-written data pages 2018-01-23 18:03:54 4526 [Note] InnoDB: from the doublewrite buffer... 2018-01-23 18:03:55 4526 [Note] InnoDB: 128 rollback segment(s) are active. 2018-01-23 18:03:55 4526 [Note] InnoDB: Waiting for purge to start 2018-01-23 18:03:55 4526 [Note] InnoDB: 5.6.32 started; log sequence number 1511045530636 2018-01-23 18:03:55 4526 [Note] RSA private key file not found: /u01/app/oracle/mysql/data//private_key.pem. Some authentication plugins will not work. 2018-01-23 18:03:55 4526 [Note] RSA public key file not found: /u01/app/oracle/mysql/data//public_key.pem. Some authentication plugins will not work. 2018-01-23 18:03:55 4526 [Note] Server hostname (bind-address): '*'; port: 49500 2018-01-23 18:03:55 4526 [Note] IPv6 is available. 2018-01-23 18:03:55 4526 [Note] - '::' resolves to '::'; 2018-01-23 18:03:55 4526 [Note] Server socket created on IP: '::'. 2018-01-23 18:03:55 4526 [Note] Event Scheduler: Loaded 0 events 2018-01-23 18:03:55 4526 [Note] /usr/sbin/mysqld: ready for connections. Version: '5.6.32-enterprise-commercial-advanced' socket: '/u01/app/oracle/mysql/data/mysqld.sock' port: 49500 MySQL Enterprise Server - Advanced Edition (Commercial)