使用sst时节点加入集群的问题:xtrabackup(galera)

时间:2013-12-03 23:06:36

标签: load-balancing mariadb galera

看起来节点正在加入群集,然后它失败了......我尝试过使用rsync和xtrabackup,它在状态转移期间失败了。在我看来,我错过了一些非常简单的东西,我无法指责它......任何帮助都将不胜感激。

有关节点的更多信息

Master - 10.XXX.XXX.161 node1 - 10.XXX.XXX.160

已安装的软件包: MariaDB-compat MariaDB-常见MariaDB-devel MariaDB共享MariaDB客户端MariaDB测试MariaDB-Galera服务器(v5.5.29-1) 加莱拉(v23.2.4-1.rhel6) percona-xtrabackup(v2.1.6-702.rhel6)

配置节点1

[mysqld]
wsrep_cluster_address = gcomm://10.XXX.XXX.161
wsrep_provider = /usr/lib64/galera/libgalera_smm.so
wsrep_provider_options = gcache.size=4G; gcache.page_size=1G
wsrep_cluster_name = galera_cluster
default_storage_engine = InnoDB
innodb_autoinc_lock_mode = 2
innodb_locks_unsafe_for_binlog = 1
wsrep_sst_method = xtrabackup
wsrep_sst_auth = root:rootpassword
wsrep_node_name=1

配置主

[mysqld]

wsrep_cluster_address = gcomm://

wsrep_provider = /usr/lib64/galera/libgalera_smm.so

wsrep_provider_options = gcache.size=4G; gcache.page_size=1G

wsrep_cluster_name = galera_cluster

default_storage_engine = InnoDB

innodb_autoinc_lock_mode = 2

innodb_locks_unsafe_for_binlog = 1

wsrep_sst_method = rsync

wsrep_slave_threads = 4

wsrep_sst_auth = root:rootpassword

wsrep_node_name = 2

node1日志文件

131203 16:09:03 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql



131203 16:09:03 mysqld_safe WSREP: Running position recovery with --log_error=/tmp/tmp.f2EedjRjbQ



131203 16:09:08 mysqld_safe WSREP: Recovered position 359350ee-5c63-11e3-0800-6673d15135cd:2188



131203 16:09:08 [Note] WSREP: wsrep_start_position var submitted: '359350ee-5c63-11e3-0800-6673d15135cd:2188'



131203 16:09:08 [Note] WSREP: Read nil XID from storage engines, skipping position init



131203 16:09:08 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'



131203 16:09:08 [Note] WSREP: wsrep_load(): Galera 23.2.4(r147) by Codership Oy <info@codership.com]]> loaded succesfully.



131203 16:09:08 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1



131203 16:09:08 [Note] WSREP: Reusing existing '/var/lib/mysql//galera.cache'.



131203 16:09:08 [Note] WSREP: Passing config to GCS: base_host = 10.XXX.XXX.160; base_port = 4567; cert.log_conflicts = no; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 1G; gcache.size = 4G; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3



131203 16:09:08 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1



131203 16:09:08 [Note] WSREP: wsrep_sst_grab()



131203 16:09:08 [Note] WSREP: Start replication



131203 16:09:08 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1



131203 16:09:08 [Note] WSREP: protonet asio version 0



131203 16:09:08 [Note] WSREP: backend: asio



131203 16:09:08 [Note] WSREP: GMCast version 0



131203 16:09:08 [Note] WSREP: (8814b4ba-5c67-11e3-0800-91035d554a96, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567



131203 16:09:08 [Note] WSREP: (8814b4ba-5c67-11e3-0800-91035d554a96, 'tcp://0.0.0.0:4567') multicast: , ttl: 1



131203 16:09:08 [Note] WSREP: EVS version 0



131203 16:09:08 [Note] WSREP: PC version 0



131203 16:09:08 [Note] WSREP: gcomm: connecting to group 'galera_cluster', peer '10.XXX.XXX.161:'



131203 16:09:09 [Note] WSREP: declaring 7a9a87e8-5c67-11e3-0800-8cb6cba8f65a stable



131203 16:09:09 [Note] WSREP: Node 7a9a87e8-5c67-11e3-0800-8cb6cba8f65a state prim



131203 16:09:09 [Note] WSREP: view(view_id(PRIM,7a9a87e8-5c67-11e3-0800-8cb6cba8f65a,2) memb {



     7a9a87e8-5c67-11e3-0800-8cb6cba8f65a,



     8814b4ba-5c67-11e3-0800-91035d554a96,



} joined {



} left {



} partitioned {



})



131203 16:09:09 [Note] WSREP: gcomm: connected



131203 16:09:09 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636



131203 16:09:09 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)



131203 16:09:09 [Note] WSREP: Opened channel 'galera_cluster'



131203 16:09:09 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2



131203 16:09:09 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.



131203 16:09:09 [Note] WSREP: Waiting for SST to complete.



131203 16:09:09 [Note] WSREP: STATE EXCHANGE: sent state msg: 8861cdd5-5c67-11e3-0800-cc70fcc5f515



131203 16:09:09 [Note] WSREP: STATE EXCHANGE: got state msg: 8861cdd5-5c67-11e3-0800-cc70fcc5f515 from 0 (2)



131203 16:09:09 [Note] WSREP: STATE EXCHANGE: got state msg: 8861cdd5-5c67-11e3-0800-cc70fcc5f515 from 1 (1)



131203 16:09:09 [Note] WSREP: Quorum results:



     version    = 2,



     component  = PRIMARY,



     conf_id    = 1,



     members    = 1/2 (joined/total),



     act_id     = 2521,



     last_appl. = -1,



     protocols  = 0/4/2 (gcs/repl/appl),



     group UUID = 359350ee-5c63-11e3-0800-6673d15135cd



131203 16:09:09 [Note] WSREP: Flow-control interval: [23, 23]



131203 16:09:09 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 2521)



131203 16:09:09 [Note] WSREP: State transfer required:



     Group state: 359350ee-5c63-11e3-0800-6673d15135cd:2521



     Local state: 00000000-0000-0000-0000-000000000000:-1



131203 16:09:09 [Note] WSREP: New cluster view: global state: 359350ee-5c63-11e3-0800-6673d15135cd:2521, view# 2: Primary, number of nodes: 2, my index: 1, protocol version 2



131203 16:09:09 [Warning] WSREP: Gap in state sequence. Need state transfer.



131203 16:09:11 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'joiner' --address '10.XXX.XXX.160' --auth 'root:rootpassword' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '13175''



131203 16:09:11 [Note] WSREP: Prepared SST request: xtrabackup|10.162.143.160:4444/xtrabackup_sst



131203 16:09:11 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.



131203 16:09:11 [Note] WSREP: Assign initial position for certification: 2521, protocol version: 2



131203 16:09:11 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (359350ee-5c63-11e3-0800-6673d15135cd): 1 (Operation not permitted)



      at galera/src/replicator_str.cpp:prepare_for_IST():442. IST will be unavailable.



131203 16:09:11 [Note] WSREP: Node 1 (1) requested state transfer from '*any*'. Selected 0 (2)(SYNCED) as donor.



131203 16:09:11 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 2525)



131203 16:09:11 [Note] WSREP: Requesting state transfer: success, donor: 0



tar: dbexport/db.opt: Cannot open: Permission denied



tar: Exiting with failure status due to previous errors



131203 16:10:22 [Note] WSREP: 0 (2): State transfer to 1 (1) complete.



131203 16:10:22 [Note] WSREP: Member 0 (2) synced with group.



WSREP_SST: [ERROR] Error while getting st data from donor node:  0, 2 (20131203 16:10:22.379)



131203 16:10:22 [ERROR] WSREP: Process completed with error: wsrep_sst_xtrabackup --role 'joiner' --address '10.XXX.XXX.160' --auth 'root:rootpassword' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --parent '13175': 32 (Broken pipe)



131203 16:10:22 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.



131203 16:10:22 [ERROR] WSREP: SST failed: 32 (Broken pipe)



131203 16:10:22 [ERROR] Aborting







131203 16:10:24 [Note] WSREP: Closing send monitor...



131203 16:10:24 [Note] WSREP: Closed send monitor.



131203 16:10:24 [Note] WSREP: gcomm: terminating thread



131203 16:10:24 [Note] WSREP: gcomm: joining thread



131203 16:10:24 [Note] WSREP: gcomm: closing backend



131203 16:10:25 [Note] WSREP: view(view_id(NON_PRIM,7a9a87e8-5c67-11e3-0800-8cb6cba8f65a,2) memb {



     8814b4ba-5c67-11e3-0800-91035d554a96,



} joined {



} left {



} partitioned {



     7a9a87e8-5c67-11e3-0800-8cb6cba8f65a,



})



131203 16:10:25 [Note] WSREP: view((empty))



131203 16:10:25 [Note] WSREP: gcomm: closed



131203 16:10:25 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1



131203 16:10:25 [Note] WSREP: Flow-control interval: [16, 16]



131203 16:10:25 [Note] WSREP: Received NON-PRIMARY.



131203 16:10:25 [Note] WSREP: Shifting JOINER -> OPEN (TO: 2607)



131203 16:10:25 [Note] WSREP: Received self-leave message.



131203 16:10:25 [Note] WSREP: Flow-control interval: [0, 0]



131203 16:10:25 [Note] WSREP: Received SELF-LEAVE. Closing connection.



131203 16:10:25 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 2607)



131203 16:10:25 [Note] WSREP: RECV thread exiting 0: Success



131203 16:10:25 [Note] WSREP: recv_thread() joined.



131203 16:10:25 [Note] WSREP: Closing slave action queue.



131203 16:10:25 [Note] WSREP: Service disconnected.



131203 16:10:25 [Note] WSREP: rollbacker thread exiting



131203 16:10:26 [Note] WSREP: Some threads may fail to exit.



131203 16:10:26 [Note] /usr/sbin/mysqld: Shutdown complete







Error in my_thread_global_end(): 2 threads didn't exit



131203 16:10:31 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended

主日志文件

131203 16:08:47 [Warning] Recovery from master pos 103358630 and file mysql-bin.000131.



131203 16:08:47 [Note] Event Scheduler: Loaded 0 events



131203 16:08:47 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.



131203 16:08:47 [Note] WSREP: Assign initial position for certification: 2497, protocol version: 2



131203 16:08:47 [Note] /usr/sbin/mysqld: ready for connections.



Version: '5.5.29-MariaDB-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server, wsrep_23.7.3.rXXXX



131203 16:08:47 [Note] WSREP: Synchronized with group, ready for connections



131203 16:08:47 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.



131203 16:09:09 [Note] WSREP: declaring 8814b4ba-5c67-11e3-0800-91035d554a96 stable



131203 16:09:09 [Note] WSREP: Node 7a9a87e8-5c67-11e3-0800-8cb6cba8f65a state prim



131203 16:09:09 [Note] WSREP: view(view_id(PRIM,7a9a87e8-5c67-11e3-0800-8cb6cba8f65a,2) memb {



     7a9a87e8-5c67-11e3-0800-8cb6cba8f65a,



     8814b4ba-5c67-11e3-0800-91035d554a96,



} joined {



} left {



} partitioned {



})



131203 16:09:09 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2



131203 16:09:09 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 8861cdd5-5c67-11e3-0800-cc70fcc5f515



131203 16:09:09 [Note] WSREP: STATE EXCHANGE: sent state msg: 8861cdd5-5c67-11e3-0800-cc70fcc5f515



131203 16:09:09 [Note] WSREP: STATE EXCHANGE: got state msg: 8861cdd5-5c67-11e3-0800-cc70fcc5f515 from 0 (2)



131203 16:09:09 [Note] WSREP: STATE EXCHANGE: got state msg: 8861cdd5-5c67-11e3-0800-cc70fcc5f515 from 1 (1)



131203 16:09:09 [Note] WSREP: Quorum results:



     version    = 2,



     component  = PRIMARY,



     conf_id    = 1,



     members    = 1/2 (joined/total),



     act_id     = 2521,



     last_appl. = 2517,



     protocols  = 0/4/2 (gcs/repl/appl),



     group UUID = 359350ee-5c63-11e3-0800-6673d15135cd



131203 16:09:09 [Note] WSREP: Flow-control interval: [23, 23]



131203 16:09:09 [Note] WSREP: New cluster view: global state: 359350ee-5c63-11e3-0800-6673d15135cd:2521, view# 2: Primary, number of nodes: 2, my index: 0, protocol version 2



131203 16:09:09 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.



131203 16:09:09 [Note] WSREP: Assign initial position for certification: 2521, protocol version: 2



131203 16:09:11 [Note] WSREP: Node 1 (1) requested state transfer from '*any*'. Selected 0 (2)(SYNCED) as donor.



131203 16:09:11 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 2525)



131203 16:09:11 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.



131203 16:09:11 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'donor' --address '10.XXX.XXX.160:4444/xtrabackup_sst' --auth 'root:rootpassword' --socket '/var/lib/mysql/mysql.sock' --datadir '/var/lib/mysql/' --defaults-file '/etc/my.cnf' --gtid '359350ee-5c63-11e3-0800-6673d15135cd:2525''



131203 16:09:11 [Note] WSREP: sst_donor_thread signaled with 0



131203 16:10:20 [Note] WSREP: Provider paused at 359350ee-5c63-11e3-0800-6673d15135cd:2604



131203 16:10:22 [Note] WSREP: Provider resumed.



131203 16:10:22 [Note] WSREP: 0 (2): State transfer to 1 (1) complete.



131203 16:10:22 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 2606)



131203 16:10:22 [Note] WSREP: Member 0 (2) synced with group.



131203 16:10:22 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 2606)



131203 16:10:22 [Note] WSREP: Synchronized with group, ready for connections



131203 16:10:22 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.



131203 16:10:25 [Note] WSREP: Node 7a9a87e8-5c67-11e3-0800-8cb6cba8f65a state prim



131203 16:10:25 [Note] WSREP: view(view_id(PRIM,7a9a87e8-5c67-11e3-0800-8cb6cba8f65a,3) memb {



     7a9a87e8-5c67-11e3-0800-8cb6cba8f65a,



} joined {



} left {



} partitioned {



     8814b4ba-5c67-11e3-0800-91035d554a96,



})



131203 16:10:25 [Note] WSREP: forgetting 8814b4ba-5c67-11e3-0800-91035d554a96 (tcp://10.XXX.XXX.160:4567)



131203 16:10:25 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1



131203 16:10:25 [Note] WSREP: STATE_EXCHANGE: sent state UUID: b5dda52e-5c67-11e3-0800-4b2360dd84f9



131203 16:10:25 [Note] WSREP: STATE EXCHANGE: sent state msg: b5dda52e-5c67-11e3-0800-4b2360dd84f9



131203 16:10:25 [Note] WSREP: STATE EXCHANGE: got state msg: b5dda52e-5c67-11e3-0800-4b2360dd84f9 from 0 (2)



131203 16:10:25 [Note] WSREP: Quorum results:



     version    = 2,



     component  = PRIMARY,



     conf_id    = 2,



     members    = 1/1 (joined/total),



     act_id     = 2607,



     last_appl. = 2597,



     protocols  = 0/4/2 (gcs/repl/appl),



     group UUID = 359350ee-5c63-11e3-0800-6673d15135cd



131203 16:10:25 [Note] WSREP: Flow-control interval: [16, 16]



131203 16:10:25 [Note] WSREP: New cluster view: global state: 359350ee-5c63-11e3-0800-6673d15135cd:2607, view# 3: Primary, number of nodes: 1, my index: 0, protocol version 2



131203 16:10:25 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.



131203 16:10:25 [Note] WSREP: Assign initial position for certification: 2607, protocol version: 2



131203 16:10:30 [Note] WSREP:  cleaning up 8814b4ba-5c67-11e3-0800-91035d554a96 (tcp://10..XXX.XXX.160:4567)

2 个答案:

答案 0 :(得分:10)

问题是MariaDB的数据目录中有一个数据库备份目录(dbexport)(可能是/var/lib/mysql/)。执行SST时,提供程序扫描数据目录以查找要发送的文件。它看到了目录并假设它是用于数据库的,因为这是数据目录中的目录。删除备份目录修复了问题。作为最佳做法,请勿在{{1​​}}中更改任何内容;程序通常会将数据文件保存在那里,弄乱它们会导致这样的问题。

主要问题解决后,日志中发现了一条新消息:

/var/lib/

此消息是正常的。当节点加入galera集群时,它将尝试执行IST(增量状态传输)而不是完整的SST(状态快照传输)。如果节点以前是群集的一部分,并且它离开时的状态与群集的当前状态之间的差异足够小,则IST可用,它只传输节点当前状态和群集状态之间的差异。这比传输所有数据快得多。如果节点以前是群集的一部分但很久以前就离开了,那么它将需要进行SST。在这种情况下,加入节点的状态UUID是[Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (359350ee-5c63-11e3-0800-6673d15135cd): 1 (Operation not permitted) at galera/src/replicator_str.cpp:prepare_for_IST():442. IST will be unavailable. ,这基本上意味着它对集群来说是新的。我运行一个MariaDB / galera集群,只要IST不可用,这条消息就会让我烦恼。如果它不是一个警告并且被重写,那将是很好的。我不确定为什么00000000-0000-0000-0000-000000000000在那里,但没什么值得担心的。

此外,建议您运行奇数个节点以防止裂脑情况。如果可能,您应该将另一个MariaDB服务器添加到群集中,或者如果不能,则运行garbdOperation not permitted充当集群中的节点,而不是数据库服务器。它允许您拥有奇数个节点,而无需另外的数据库服务器。

答案 1 :(得分:1)

在我的情况下,替换主群集到辅助群集可以解决问题。 在db1之前

[mysqld]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
query_cache_size=0
query_cache_type=0
bind-address=0.0.0.0

# Galera Provider Configuration
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="name"
wsrep_cluster_address="gcomm://37.x.x.104,37.x.x.117"
wsrep_sst_method=rsync
wsrep_node_address="37.x.x.104"
wsrep_node_name="db1"

pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
port = 3306
log_error = /var/log/mysql/error.log
expire_logs_days = 10
max_binlog_size = 100M

在db2上

[mysqld]
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
query_cache_size=0
query_cache_type=0
bind-address=0.0.0.0

# Galera Provider Configuration
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name="name"
wsrep_cluster_address="gcomm://37.x.x.104,37.x.x.117"
wsrep_sst_method=rsync
wsrep_node_address="37.x.x.104"
wsrep_node_name="db1"

pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
port = 3306
log_error = /var/log/mysql/error.log
expire_logs_days = 10
max_binlog_size = 100M

我改变了

wsrep_cluster_address="gcomm://37.x.x.104,37.x.x.117"

wsrep_cluster_address="gcomm://37.x.x.117,37.x.x.104"

wsrep_node_address="37.x.x.**104**" to **117**

并且群集已经开始了!