为什么galera会因为简单的ALTER语句而崩溃

时间:2019-01-15 14:16:15

标签: mariadb galera

我有一个Mariadb 10.2.14 5节点Galera服务器。简单直接的数据库差不多20G。没有触发器。很多索引和外键。 在哪里我尝试通过命令行MySQL在多主机之一上更改空表或小表(添加字段),然后整个集群崩溃,为什么?我从未在其他Galera Systems上遇到过此问题。 RedHat 6.10是操作系统。

有人可以帮忙吗? 这是其中一台服务器上的错误日志:

使用简单的alter语句更新简单表时,5节点多主控Galera停止工作,并且该表已损坏。使用不同的表和简单的变更语句(没有触发器)已经发生了好几次。

mysql-errorlog显示如下:

2019-01-15 10:47:19 140487941920512 [Note] WSREP: Member 1.0 (server.company.local) synced with group.
2019-01-15 11:07:45 140487941920512 [Note] WSREP: Member 1.0 (server.company.local) desyncs itself from group
2019-01-15 11:07:46 140487941920512 [Note] WSREP: Member 1.0 (server.company.local) resyncs itself to group
2019-01-15 11:07:46 140487941920512 [Note] WSREP: Member 1.0 (server.company.local) synced with group.
2019-01-15 11:27:40 140487941920512 [Note] WSREP: Member 1.0 (server.company.local) desyncs itself from group
2019-01-15 11:27:41 140487941920512 [Note] WSREP: Member 1.0 (server.company.local) resyncs itself to group
2019-01-15 11:27:41 140487941920512 [Note] WSREP: Member 1.0 (server.company.local) synced with group.
2019-01-15 11:47:23 140487941920512 [Note] WSREP: Member 1.0 (server.company.local) desyncs itself from group
2019-01-15 11:47:24 140487941920512 [Note] WSREP: Member 1.0 (server.company.local) resyncs itself to group
2019-01-15 11:47:24 140487941920512 [Note] WSREP: Member 1.0 (server.company.local) synced with group.
2019-01-15 12:24:39 140452405958400 [Note] WSREP: MDL BF-BF conflict

schema:  databasename
request: (8227134       seqno 46874664  wsrep (2, 1, 0) cmd 3 3         ALTER TABLE `aagenda` ADD `id_subject_cat` int(11) NULL DEFAULT '0' AFTER `id_subject`, ADD INDEX `id_s$
granted: (15    seqno 46874665  wsrep (1, 0, 0) cmd 0 147       (null))
2019-01-15 12:24:40 140452405958400 [Note] WSREP: MDL BF-BF conflict
schema:  databasename
request: (8227134       seqno 46874664  wsrep (2, 1, 0) cmd 3 3         ALTER TABLE `aagenda` ADD `id_subject_cat` int(11) NULL DEFAULT '0' AFTER `id_subject`, ADD INDEX `id_s$
granted: (15    seqno 46874665  wsrep (1, 0, 0) cmd 0 147       (null))
2019-01-15 12:24:40 140452405958400 [Note] WSREP: MDL BF-BF conflict
schema:  databasename
request: (8227134       seqno 46874664  wsrep (2, 1, 0) cmd 3 3         ALTER TABLE `aagenda` ADD `id_subject_cat` int(11) NULL DEFAULT '0' AFTER `id_subject`, ADD INDEX `id_s$
granted: (11    seqno 46874666  wsrep (1, 0, 0) cmd 0 147       (null))
2019-01-15 12:24:40 0x7fbd9fc3d700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.2.14/storage/innobase/row/row0merge.cc l$

InnoDB: Failing assertion: table->get_ref_count() == 0
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/xtradbinnodb-recovery-modes/
InnoDB: about forcing recovery.

190115 12:24:40 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 10.2.14-MariaDB-log
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=837
max_threads=1502
thread_count=280

It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3431472 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7fbe2d906c18
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went terribly wrong...
stack_bottom = 0x7fbd9fc3cd80 thread_stack 0x49000
/usr/sbin/mysqld(my_print_stacktrace+0x2b)[0x55f4e00d8fab]
/usr/sbin/mysqld(handle_fatal_signal+0x535)[0x55f4dfbad005]
/lib64/libpthread.so.0(+0xf7e0)[0x7fc5f97f67e0]
/lib64/libc.so.6(gsignal+0x35)[0x7fc5f7e50495]
/lib64/libc.so.6(abort+0x175)[0x7fc5f7e51c75]
/usr/sbin/mysqld(+0x47c4eb)[0x55f4df97a4eb]
/usr/sbin/mysqld(+0x90edcc)[0x55f4dfe0cdcc]
/usr/sbin/mysqld(+0x873236)[0x55f4dfd71236]
/usr/sbin/mysqld(_Z17mysql_alter_tableP3THDPcS1_P14HA_CREATE_INFOP10TABLE_LISTP10Alter_infojP8st_orderb+0x29ed)[0x55f4dfab181d]
/usr/sbin/mysqld(_ZN19Sql_cmd_alter_table7executeEP3THD+0x3ae)[0x55f4dfaf62fe]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0xf81)[0x55f4dfa2b251]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x29a)[0x55f4dfa327ca]
/usr/sbin/mysqld(+0x5348c0)[0x55f4dfa328c0]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x18cd)[0x55f4dfa346fd]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x16e)[0x55f4dfa350ee]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x16f)[0x55f4dfaf335f]
/usr/sbin/mysqld(handle_one_connection+0x44)[0x55f4dfaf3484]
/lib64/libpthread.so.0(+0x7aa1)[0x7fc5f97eeaa1]
/lib64/libc.so.6(clone+0x6d)[0x7fc5f7f06bdd]

Trying to get some variables.

Some pointers may be invalid and cause the dump to abort.

Query (0x7fbe2d9141f0): is an invalid pointer

Connection ID (thread ID): 8227134
Status: NOT_KILLED

Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_push$

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
We think the query pointer is invalid, but we will try to print it anyway.

Query: ALTER TABLE `aagenda` ADD `id_subject_cat` int(11) NULL DEFAULT '0' AFTER `id_subject`, ADD INDEX `id_subject_cat` (`id_subject_cat`)

1 个答案:

答案 0 :(得分:0)

我从MariaDB获得的建议:

如果您想在Galera生产环境中更改表而不造成停机,请按节点执行以下操作:

SET GLOBAL wsrep_desync = TRUE; (OR SET GLOBAL wsrep_desync = ON;)
SET SESSION wsrep_on = FALSE; (OR SET GLOBAL wsrep_on= OFF ;)

--- ALTER STATEMENT --- 

SET SESSION wsrep_on = TRUE; (OR SET GLOBAL wsrep_on= ON ;) 
SET GLOBAL wsrep_desync = FALSE; (OR SET GLOBAL wsrep_desync = OFF;)

但是表结构必须是向后兼容的,可由应用程序使用,否则,您必须停止集群,然后在一个节点上更改表,然后重新启动集群。