Esta mañana se produjo un fallo hardware en uno de los servidores de laboratorio con OVM Server 3.4. En este entorno, OVM Manager está desplegado dentro del propio clúster OVM, y la incidencia provocó su caída abrupta. Al arrancar la máquina de manera manual desde el servidor OVM con el comando xm create, el SO arrancó correctamente, pero no los componentes OVM. ¿El motivo? Corrupción de datos en la BD MySQL tras la caída.

180123 17:53:24 mysqld_safe Starting mysqld daemon with databases from /u01/app/oracle/mysql/data
2018-01-23 17:53:24 0 [Warning] 'THREAD_CONCURRENCY' is deprecated and will be removed in a future release.
2018-01-23 17:53:24 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2018-01-23 17:53:24 0 [Note] /usr/sbin/mysqld (mysqld 5.6.32-enterprise-commercial-advanced) starting as process 3816 ...
2018-01-23 17:53:24 3816 [Note] Plugin 'FEDERATED' is disabled.
2018-01-23 17:53:24 3816 [Note] InnoDB: Using atomics to ref count buffer pool pages
2018-01-23 17:53:24 3816 [Note] InnoDB: The InnoDB memory heap is disabled
2018-01-23 17:53:24 3816 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2018-01-23 17:53:24 3816 [Note] InnoDB: Memory barrier is not used
2018-01-23 17:53:24 3816 [Note] InnoDB: Compressed tables use zlib 1.2.3
2018-01-23 17:53:24 3816 [Note] InnoDB: Using Linux native AIO
2018-01-23 17:53:24 3816 [Note] InnoDB: Using CPU crc32 instructions
2018-01-23 17:53:24 3816 [Note] InnoDB: Initializing buffer pool, size = 1.0G
2018-01-23 17:53:24 3816 [Note] InnoDB: Completed initialization of buffer pool
2018-01-23 17:53:24 3816 [Note] InnoDB: Highest supported file format is Barracuda.
2018-01-23 17:53:24 3816 [Note] InnoDB: The log sequence numbers 1484386085656 and 1484386085656 in ibdata files do not match the log sequence number 1511364890742 in the ib_logfiles!
2018-01-23 17:53:24 3816 [Note] InnoDB: Database was not shutdown normally!
2018-01-23 17:53:24 3816 [Note] InnoDB: Starting crash recovery.
2018-01-23 17:53:24 3816 [Note] InnoDB: Reading tablespace information from the .ibd files...
2018-01-23 17:53:26 3816 [Note] InnoDB: Restoring possible half-written data pages 
2018-01-23 17:53:26 3816 [Note] InnoDB: from the doublewrite buffer...
2018-01-23 17:53:26 3816 [ERROR] InnoDB: We scanned the log up to 1511364890624. A checkpoint was at 1511364890742 and the maximum LSN on a database page was 0. It is possible that the database is now corrupt!
2018-01-23 17:53:26 7f451f3e8720 InnoDB: Error: page 5 log sequence number 1511364907354
InnoDB: is in the future! Current system log sequence number 1511364890742.
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files. See
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: for more information.
(...)
InnoDB: 3 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 0 row operations to undo
InnoDB: Trx id counter is 1018590976
InnoDB: Cleaning up trx with id 1018590873
2018-01-23 17:53:27 7f451f3e8720 InnoDB: Assertion failure in thread 139934853662496 in file trx0trx.cc line 292
InnoDB: Failing assertion: trx->update_undo == NULL
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
16:53:27 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

key_buffer_size=33554432
read_buffer_size=2097152
max_used_connections=0
max_threads=151
thread_count=0
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 653196 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x35)[0x918445]
/usr/sbin/mysqld(handle_fatal_signal+0x494)[0x69bdd4]
/lib64/libpthread.so.0(+0xf7e0)[0x7f451efc97e0]
/lib64/libc.so.6(gsignal+0x35)[0x7f451da695e5]
/lib64/libc.so.6(abort+0x175)[0x7f451da6adc5]
/usr/sbin/mysqld[0xa48492]
/usr/sbin/mysqld[0xa41758]
/usr/sbin/mysqld[0x9b4488]
/usr/sbin/mysqld[0xa2f975]
/usr/sbin/mysqld[0x973e0d]
/usr/sbin/mysqld(_Z24ha_initialize_handlertonP13st_plugin_int+0x48)[0x5df848]
/usr/sbin/mysqld[0x725cb1]
/usr/sbin/mysqld(_Z11plugin_initPiPPci+0xbb6)[0x729b16]
/usr/sbin/mysqld[0x5d21c8]
/usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x3e5)[0x5d6de5]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7f451da55d1d]
/usr/sbin/mysqld[0x5c83a5]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
180123 17:53:27 mysqld_safe mysqld from pid file /u01/app/oracle/mysql/data/mysqld.pid ended

Tiene mal aspecto, pero aunque la primera opción en la que pensamos es la de atacar a nivel MySQL, dado que el entorno es muy estático y no ha habido cambios de configuración en las últimas 24h, optamos por la vía rápida e integrada en OVM, la herramienta de restauración de MySQL de OVMM. Para ello, basta seguir la documentación bastante clara y explícita de Oracle que podemos ver aquí. En nuestro caso, solo queremos recuperar a nivel de BD, por lo que no necesitamos ejecutar el paso número 1.

OVMM genera diariamente y de manera automática, un backup de la BD MySQL en la ruta /u01/app/oracle/mysql/backup (OVM 3.4). Vemos el contenido de este directorio en nuestro caso:

[oracle@ovmmanager ~]$ ls -latr /u01/app/oracle/mysql/dbbackup/
total 2592
drwx------  4 oracle oinstall   4096 Apr  1  2015 3.3.1_preUpgradeBackup-20150401_114910
drwx------  4 oracle oinstall   4096 Mar 10  2016 3.3.2_preUpgradeBackup-20160310_145455
drwx------  4 oracle oinstall   4096 Mar 28  2016 3.3.3_preUpgradeBackup-20160328_133313
drwx------  4 oracle oinstall   4096 Dec 29  2016 3.4.1_preUpgradeBackup-20161229_195435
drwx------  4 oracle oinstall   4096 Sep 10 19:24 AutoFullBackup-20170910_192222
drwx------  4 oracle oinstall   4096 Sep 11 19:25 AutoFullBackup-20170911_192432
drwx------  4 oracle oinstall   4096 Sep 12 19:27 AutoFullBackup-20170912_192620
drwx------  4 oracle oinstall   4096 Sep 13 19:28 AutoFullBackup-20170913_192733
drwxr-xr-x  5 oracle dba        4096 Jan  5 12:18 ..
drwx------  4 oracle oinstall   4096 Jan  6 12:38 AutoFullBackup-20180106_123729
drwx------  4 oracle oinstall   4096 Jan  7 12:39 AutoFullBackup-20180107_123822
drwx------  4 oracle oinstall   4096 Jan  8 12:40 AutoFullBackup-20180108_123931
drwx------  4 oracle oinstall   4096 Jan  9 12:41 AutoFullBackup-20180109_124037
drwx------  4 oracle oinstall   4096 Jan 10 12:42 AutoFullBackup-20180110_124144
drwx------  4 oracle oinstall   4096 Jan 11 16:52 AutoFullBackup-20180111_165134
drwx------  4 oracle oinstall   4096 Jan 12 16:53 AutoFullBackup-20180112_165229
drwx------  4 oracle oinstall   4096 Jan 13 16:54 AutoFullBackup-20180113_165327
-rw-r-----  1 oracle oinstall 251602 Jan 13 17:21 OVMModelExport-20180113_172134.xml
drwx------  4 oracle oinstall   4096 Jan 14 16:55 AutoFullBackup-20180114_165436
-rw-r-----  1 oracle oinstall 254527 Jan 14 17:21 OVMModelExport-20180114_172136.xml
drwx------  4 oracle oinstall   4096 Jan 15 16:56 AutoFullBackup-20180115_165545
-rw-r-----  1 oracle oinstall 257452 Jan 15 17:21 OVMModelExport-20180115_172138.xml
drwx------  4 oracle oinstall   4096 Jan 16 16:57 AutoFullBackup-20180116_165655
-rw-r-----  1 oracle oinstall 260363 Jan 16 17:21 OVMModelExport-20180116_172141.xml
drwx------  4 oracle oinstall   4096 Jan 17 16:59 AutoFullBackup-20180117_165804
-rw-r-----  1 oracle oinstall 263280 Jan 17 17:21 OVMModelExport-20180117_172144.xml
drwx------  4 oracle oinstall   4096 Jan 18 17:00 AutoFullBackup-20180118_165918
-rw-r-----  1 oracle oinstall 241287 Jan 18 17:21 OVMModelExport-20180118_172147.xml
drwx------  4 oracle oinstall   4096 Jan 19 17:01 AutoFullBackup-20180119_170019
-rw-r-----  1 oracle oinstall 244200 Jan 19 17:21 OVMModelExport-20180119_172149.xml
drwx------  4 oracle oinstall   4096 Jan 20 17:02 AutoFullBackup-20180120_170140
-rw-r-----  1 oracle oinstall 247125 Jan 20 17:21 OVMModelExport-20180120_172151.xml
drwx------  4 oracle oinstall   4096 Jan 21 17:03 AutoFullBackup-20180121_170254
-rw-r-----  1 oracle oinstall 250050 Jan 21 17:21 OVMModelExport-20180121_172154.xml
drwx------  4 oracle oinstall   4096 Jan 22 17:05 AutoFullBackup-20180122_170402
-rw-r-----  1 oracle oinstall 250050 Jan 22 17:21 OVMModelExport-20180122_172156.xml
drwxr-xr-x 27 oracle dba        4096 Jan 22 17:21 .

La última copia, de ayer, es AutoFullBackup-20180122_170402. Con esta información, en primer lugar paramos todo OVMM (Weblogic OVM Server, Oracle VM Command Line Interface y MySQL database):

[root@ovmmanager ~]# service ovmm stop
Stopping Oracle VM Manager                                 [  OK  ]

[root@ovmmanager ~]# service ovmcli stop
Stopping Oracle VM Manager CLI                             [  OK  ]

[root@ovmmanager ~]# service ovmm_mysql stop
Shutting down OVMM MySQL.. SUCCESS!

Ahora, como usuario oracle, lanzamos el shell script RestoreDatabase.sh especificando el nombre del backup desde el que queremos hacer la recuperación.

[oracle@ovmmanager ~]$ /u01/app/oracle/ovm-manager-3/ovm_tools/bin/RestoreDatabase.sh AutoFullBackup-20180122_170402

Before the database can be restored, the following database directories/files must be deleted:
appfw ibdata1 ib_logfile0 ib_logfile1 mysql ovs performance_schema

Are you sure it is safe to delete these directories/files now? [y,n] y
Deleting /u01/app/oracle/mysql/data/appfw
Deleting /u01/app/oracle/mysql/data/ibdata1
Deleting /u01/app/oracle/mysql/data/ib_logfile0
Deleting /u01/app/oracle/mysql/data/ib_logfile1
Deleting /u01/app/oracle/mysql/data/mysql
Deleting /u01/app/oracle/mysql/data/ovs
Deleting /u01/app/oracle/mysql/data/performance_schema
INFO: Expanding the backup image...
INFO: Applying logs to the backup snapshot...
INFO: Restoring the backup...
INFO: Restoring OVM keystores and certificates
INFO: Success - Done!
INFO: Log of operations performed is available at: /u01/app/oracle/mysql/dbbackup/AutoFullBackup-20180122_170402/Restore.log


IMPORTANT:

      As 'root', please start the OVM Manager database and application using:
            service ovmm_mysql start; service ovmm start; service ovmcli start


[oracle@ovmmanager ~]$

Arrancamos de nuevo los servicios:

[root@ovmmanager ~]# service ovmm_mysql start;
Starting OVMM MySQL...... SUCCESS! 

[root@ovmmanager ~]# service ovmm start
Starting Oracle VM Manager                                 [  OK  ]

[root@ovmmanager ~]# service ovmcli start
Starting Oracle VM Manager CLI                             [  OK  ]

Todo arranca OK, incluída la BD. ¡Hemos recuperado la normalidad!

180123 18:03:50 mysqld_safe Starting mysqld daemon with databases from /u01/app/oracle/mysql/data
2018-01-23 18:03:50 0 [Warning] 'THREAD_CONCURRENCY' is deprecated and will be removed in a future release.
2018-01-23 18:03:50 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2018-01-23 18:03:50 0 [Note] /usr/sbin/mysqld (mysqld 5.6.32-enterprise-commercial-advanced) starting as process 4526 ...
2018-01-23 18:03:50 4526 [Note] Plugin 'FEDERATED' is disabled.
2018-01-23 18:03:51 4526 [Note] InnoDB: Using atomics to ref count buffer pool pages
2018-01-23 18:03:51 4526 [Note] InnoDB: The InnoDB memory heap is disabled
2018-01-23 18:03:51 4526 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2018-01-23 18:03:51 4526 [Note] InnoDB: Memory barrier is not used
2018-01-23 18:03:51 4526 [Note] InnoDB: Compressed tables use zlib 1.2.3
2018-01-23 18:03:51 4526 [Note] InnoDB: Using Linux native AIO
2018-01-23 18:03:51 4526 [Note] InnoDB: Using CPU crc32 instructions
2018-01-23 18:03:51 4526 [Note] InnoDB: Initializing buffer pool, size = 1.0G
2018-01-23 18:03:51 4526 [Note] InnoDB: Completed initialization of buffer pool
2018-01-23 18:03:51 4526 [Note] InnoDB: Highest supported file format is Barracuda.
2018-01-23 18:03:51 4526 [Note] InnoDB: The log file was created by mysqlbackup --apply-log at ibbackup 180123 17:59:17. The following crash recovery is part of a normal restore.
2018-01-23 18:03:51 4526 [Note] InnoDB: The log sequence numbers 1484386085656 and 1484386085656 in ibdata files do not match the log sequence number 1511045530636 in the ib_logfiles!
2018-01-23 18:03:51 4526 [Note] InnoDB: Database was not shutdown normally!
2018-01-23 18:03:51 4526 [Note] InnoDB: Starting crash recovery.
2018-01-23 18:03:51 4526 [Note] InnoDB: Reading tablespace information from the .ibd files...
2018-01-23 18:03:54 4526 [Note] InnoDB: Restoring possible half-written data pages 
2018-01-23 18:03:54 4526 [Note] InnoDB: from the doublewrite buffer...
2018-01-23 18:03:55 4526 [Note] InnoDB: 128 rollback segment(s) are active.
2018-01-23 18:03:55 4526 [Note] InnoDB: Waiting for purge to start
2018-01-23 18:03:55 4526 [Note] InnoDB: 5.6.32 started; log sequence number 1511045530636
2018-01-23 18:03:55 4526 [Note] RSA private key file not found: /u01/app/oracle/mysql/data//private_key.pem. Some authentication plugins will not work.
2018-01-23 18:03:55 4526 [Note] RSA public key file not found: /u01/app/oracle/mysql/data//public_key.pem. Some authentication plugins will not work.
2018-01-23 18:03:55 4526 [Note] Server hostname (bind-address): '*'; port: 49500
2018-01-23 18:03:55 4526 [Note] IPv6 is available.
2018-01-23 18:03:55 4526 [Note]   - '::' resolves to '::';
2018-01-23 18:03:55 4526 [Note] Server socket created on IP: '::'.
2018-01-23 18:03:55 4526 [Note] Event Scheduler: Loaded 0 events
2018-01-23 18:03:55 4526 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.6.32-enterprise-commercial-advanced'  socket: '/u01/app/oracle/mysql/data/mysqld.sock'  port: 49500  MySQL Enterprise Server - Advanced Edition (Commercial)