我正在努力让mariadb群集启动并运行,但它对我来说不起作用。现在我在64位红帽ES6机器上使用MariaDB Galera 5.5.36。我在这里通过这个仓库安装了mariadb:
[mariadb]
name = MariaDB
baseurl = http://yum.mariadb.org/5.5-galera/rhel6-amd64/
gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB
gpgcheck=1
在server.conf中,我在服务器1中有以下内容:
[mariadb]
log_error=/var/log/mariadb.log
query_cache_size=0
query_cache_type=0
binlog_format=ROW
default_storage_engine=innodb
innodb_autoinc_lock_mode=2
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address=gcomm://192.168.211.133
wsrep_cluster_name='cluster'
wsrep_node_address='192.168.211.132'
wsrep_node_name='cluster1'
wsrep_sst_method=rsync
在服务器2上我有
[mariadb]
log_error=/var/log/mariadb.log
query_cache_size=0
query_cache_type=0
binlog_format=ROW
default_storage_engine=innodb
innodb_autoinc_lock_mode=2
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address=gcomm://192.168.211.132
wsrep_cluster_name='cluster'
wsrep_node_address='192.168.211.133'
wsrep_node_name='cluster2'
wsrep_sst_method=rsync
当我使用以下命令启动服务器1时:sudo service mysql start --wsrep-new-cluster它启动就好了,如果我打开mysql并检查wsrep的状态它表示一切正常并且正在运行,这是好,但是当我尝试在第二台服务器上执行sudo服务mysql启动时,我在错误日志中得到以下内容:
140609 14:47:55 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
140609 14:47:56 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/wsrep_recovery.i5qfm2' --pid-file='/var/lib/mysql/localhost.localdomain-recover.pid'
140609 14:47:57 mysqld_safe WSREP: Recovered position 85448d73-ebe8-11e3-9c20-fbc1995fee11:0
140609 14:47:57 [Note] WSREP: wsrep_start_position var submitted: '85448d73-ebe8-11e3-9c20-fbc1995fee11:0'
140609 14:47:57 [Note] WSREP: Read nil XID from storage engines, skipping position init
140609 14:47:57 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
140609 14:47:57 [Note] WSREP: wsrep_load(): Galera 25.3.2(r170) by Codership Oy <info@codership.com> loaded successfully.
140609 14:47:57 [Note] WSREP: CRC-32C: using hardware acceleration.
140609 14:47:57 [Note] WSREP: Found saved state: 85448d73-ebe8-11e3-9c20-fbc1995fee11:-1
140609 14:47:57 [Note] WSREP: Passing config to GCS: base_host = 192.168.211.133; base_port = 4567; cert.log_conflicts = no; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.proto_max = 5
140609 14:47:57 [Note] WSREP: Assign initial position for certification: 0, protocol version: -1
140609 14:47:57 [Note] WSREP: wsrep_sst_grab()
140609 14:47:57 [Note] WSREP: Start replication
140609 14:47:57 [Note] WSREP: Setting initial position to 85448d73-ebe8-11e3-9c20-fbc1995fee11:0
140609 14:47:57 [Note] WSREP: protonet asio version 0
140609 14:47:57 [Note] WSREP: Using CRC-32C (optimized) for message checksums.
140609 14:47:57 [Note] WSREP: backend: asio
140609 14:47:57 [Note] WSREP: GMCast version 0
140609 14:47:57 [Note] WSREP: (0c085f34-efe5-11e3-9f6b-8bfd1706e2a4, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
140609 14:47:57 [Note] WSREP: (0c085f34-efe5-11e3-9f6b-8bfd1706e2a4, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
140609 14:47:57 [Note] WSREP: EVS version 0
140609 14:47:57 [Note] WSREP: PC version 0
140609 14:47:57 [Note] WSREP: gcomm: connecting to group 'cluster', peer '192.168.211.132:,192.168.211.134:'
140609 14:48:00 [Warning] WSREP: no nodes coming from prim view, prim not possible
140609 14:48:00 [Note] WSREP: view(view_id(NON_PRIM,0c085f34-efe5-11e3-9f6b-8bfd1706e2a4,1) memb {
0c085f34-efe5-11e3-9f6b-8bfd1706e2a4,0
} joined {
} left {
} partitioned {
})
140609 14:48:01 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50775S), skipping check
140609 14:48:31 [Note] WSREP: view((empty))
140609 14:48:31 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at gcomm/src/pc.cpp:connect():141
140609 14:48:31 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():196: Failed to open backend connection: -110 (Connection timed out)
140609 14:48:31 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1291: Failed to open channel 'cluster' at 'gcomm://192.168.211.132,192.168.211.134': -110 (Connection timed out)
140609 14:48:31 [ERROR] WSREP: gcs connect failed: Connection timed out
140609 14:48:31 [ERROR] WSREP: wsrep::connect() failed: 7
140609 14:48:31 [ERROR] Aborting
140609 14:48:31 [Note] WSREP: Service disconnected.
140609 14:48:32 [Note] WSREP: Some threads may fail to exit.
140609 14:48:32 [Note] /usr/sbin/mysqld: Shutdown complete
140609 14:48:32 mysqld_safe mysqld from pid file /var/lib/mysql/localhost.localdomain.pid ended
我不知道为什么第二台服务器无法检测到群集已启动并正在运行。这些机器可以很好地相互通信,我可以从一个SSH到另一个,它们可以相互ping通。我尝试删除galera缓存,尝试降级我的mariadb galera版本,尝试禁用SELinux,尝试以不同的用户身份运行mysql服务,验证正确的端口是打开的,尝试在具有不同IP地址的不同计算机上的2个VM上运行它们有没有人知道这里发生了什么,因为我一直在寻找3天试图解决这个问题,但似乎没有任何解决方案可以解决这个问题。
答案 0 :(得分:1)
我认为您需要列出wsrep_cluster_address参数中的所有IP。
wsrep_cluster_address = gcomm会议://192.168.211.132,192.168.211.133
这应该在两台主机上完成。顺便说一句,你可能想要三个节点而不是两个节点,以避免分裂脑情景。
答案 1 :(得分:1)
以下是我修复类似问题的方法。
CentOS 7 w / MariaDB Galera 10.1。
Node2我看到了这个:
one = [1, "taco", 3, 2, :piece, 4, 5, 5, 5, 5]
two = [:piece, 2, 5, 4, 1, "taco", 3]
def same_elements?(array_one, array_two)
return true if ( (array_one - array_two).empty? && (array_two - array_one).empty? )
return false
end
same_elements?(one, two)
在做了一些阅读之后,我尝试在node1上运行它。
016-12-27 15:40:38 140703512762624 [Warning] WSREP: no nodes coming from prim view, prim not possible
但这失败了,在日志中,我发现了......
service mysql start --wsrep-new-cluster
所以我编辑了文件2016-12-27 15:44:08 140438853814528 [ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 .
,将/var/lib/mysql/grastate.dat
更改为safe_to_bootstrap
。
然后我可以使用:
启动主节点1
然后在其他人身上,我只是用了
service mysql start --wsrep-new-cluster
注意:这是在演示预生产环境中。我通过同时重新启动所有服务器来完成所有工作后立即打破它:P,但我知道没有写入,并且数据库是同步的。如果您处于生产状态并且发生这种情况,您可以使用以下内容来确定运行“new-cluster”的节点,这类似于说,让我成为主要的。
service mysql start
如果这是一个生产问题,我非常建议您阅读本文并使用CloneZilla进行备份,然后再向破碎的客户端发送命令!
https://www.percona.com/blog/2014/09/01/galera-replication-how-to-recover-a-pxc-cluster/
答案 2 :(得分:0)
群集必须在主节点上以此命令启动:
galera_new_cluster
启动第一个节点后,您可以成功启动集群中的其他节点。