我们正在运行Percona MySQL 5.5 XtraDB集群(2个节点和一个仲裁员)galera 2-2.12。我正在运行haproxy来提供透明代理(通过iptables TPROXY),专门针对一个节点,除非它不可用。 每10-20天我们遇到一个看起来像这样的问题。 日志中问题的第一个迹象是:
[Warning] Too many connections
这将持续几分钟,但接着我们会得到:
TRANSACTION 2B37091B, ACTIVE 1506 sec, thread declared inside InnoDB 499
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1248, 2 row lock(s), undo log entries 1
MySQL thread id 1498250, OS thread handle 0x7efccc658700, query id 14839064 <db02> <db02 ip> <db> wsrep in pre-commit stage
<update query>
*** WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 773 page no 1707 n bits 304 index `date` of table <table> trx id 2B37091B lock_mode X locks rec but not gap
170427 14:03:47 [Note] WSREP: cluster conflict due to high priority abort for threads:
170427 14:03:47 [Note] WSREP: Winning thread:
THD: 4, mode: applier, state: executing, conflict: no conflict, seqno: 38463147
SQL: (null)
170427 14:03:47 [Note] WSREP: Victim thread:
THD: 1498250, mode: local, state: committing, conflict: no conflict, seqno: 38463644
SQL: <update query>
然后我们会得到一堆:
170427 14:03:49 [Note] WSREP: cluster conflict due to certification failure for threads:
170427 14:03:49 [Note] WSREP: Victim thread:
THD: 1498309, mode: local, state: executing, conflict: cert failure, seqno: 38463678
SQL: <insert query>
完成这些操作后,群集将恢复正常。在这种情况下,群集被清除,最终用户报告数据库中断。它本身并没有用来解决,但是一旦我将它添加到我的配置中,它就会在1-5分钟内从这个事件中恢复:
wsrep_provider_options="gcs.fc_limit=500; gcs.fc_master_slave=YES; gcs.fc_factor=1.0"
我的数据库配置:
[client]
socket=/var/lib/mysql/mysql.sock
[mysqld]
server-id=<id>
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
log-bin
log_slave_updates
expire_logs_days=7
symbolic-links=0
wsrep_provider=/usr/lib64/galera2/libgalera_smm.so
wsrep_cluster_address=gcomm://<gcom string>
binlog_format=ROW
default_storage_engine=InnoDB
wsrep_slave_threads= 8
wsrep_log_conflicts
wsrep_cluster_name=<cluster name>
wsrep_node_name=<node name>
wsrep_node_address=<node ip>
wsrep_provider_options="gcs.fc_limit=500; gcs.fc_master_slave=YES; gcs.fc_factor=1.0"
wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth=<redact>
max_connections=300
innodb_buffer_pool_size=20G
innodb_additional_mem_pool_size = 20M
innodb_autoinc_lock_mode = 2
innodb_buffer_pool_instances = 20
innodb_lock_wait_timeout = 120
innodb_log_buffer_size = 8M
innodb_log_file_size = 48M
innodb_log_files_in_group = 3
innodb_max_dirty_pages_pct = 90
innodb_read_io_threads = 8
innodb_thread_concurrency = 16
innodb_write_io_threads = 8
innodb_file_per_table = 1
查询似乎总是引用的表格如下所示:
+-----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+-------+
| date | date | NO | MUL | NULL | |
| page_name | varchar(100) | YES | MUL | NULL | |
| page_hits | float | NO | | 1 | |
+-----------+--------------+------+-----+---------+-------+
+-----------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+-----+---------+-------+
| ip_hash | varchar(32) | NO | MUL | NULL | |
| timestamp | timestamp | YES | MUL | NULL | |
+-----------+-------------+------+-----+---------+-------+
我现在已经在与galera结束并准备好回到独立的mysql,但任何建议都会非常感激。