Question

我有一个3个percona xtradb 5.5.34-55服务器的集群，因为它们都是可写的，所以在任何实质性负载下都会出现死锁错误。增加wsrep_retry_autocommit变量在某种程度上有助于它，但ER_LOCK_DEADLOCK并未完全消失。所以我尝试将wsrep_retry_autocommit设置为10000（似乎是最大值），认为这会使某些查询变得非常慢，但是ER_LOCK_DEADLOCK中没有一个会失败：

mysql-shm -ss -e 'show global variables like "%wsrep_retry_auto%"'
wsrep_retry_autocommit  10000

------------------------
LATEST DETECTED DEADLOCK
------------------------
140414 10:29:23
*** (1) TRANSACTION:
TRANSACTION 72D8, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 376, 1 row lock(s), undo log entries 1
MySQL thread id 34, OS thread handle 0x7f11840d4700, query id 982 localhost shm update
REPLACE INTO metric(host, name, userid, sampleid, type, priority) VALUES
('localhost','cpu-3/cpu-nice',8,0,0,0),('localhost','cpu-3/cpu-system',8,0,0,0),
('localhost','cpu-3/cpu-idle',8,0,0,0),('localhost','cpu-3/cpu-wait',8,0,0,0),
('localhost','cpu-3/cpu-interrupt',8,0,0,0),('localhost','cpu-3/cpu-softirq',8,0,0,0),
('localhost','cpu-3/cpu-steal',8,0,0,0),('localhost','cpu-4/cpu-user',8,0,0,0),
('localhost','cpu-4/cpu-nice',8,0,0,0),('localhost','cpu-4/cpu-system',8,0,0,0),
('localhost','cpu-4/cpu-idle',8,0,0,0),('localhost','cpu-4/cpu-wait',8,0,0,0),
('localhost','cpu-4/cpu-interrupt',8,0,0,0),('localhost','cpu-4/cpu-softirq',8,0,0,0),
('localhost','cpu-4/cpu-steal',8,0,0,0)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 344 n bits 488 index `unique-metric` of
table `shm`.`metric` trx id 72D8 lock_mode X waiting
*** (2) TRANSACTION:
TRANSACTION 72D7, ACTIVE 0 sec updating or deleting
mysql tables in use 1, locked 1
7 lock struct(s), heap size 3112, 141 row lock(s), undo log entries 40
MySQL thread id 50, OS thread handle 0x7f1184115700, query id 980 localhost shm update
REPLACE INTO metric(host, name, userid, sampleid, type, priority) VALUES
('localhost','cpu-3/cpu-nice',8,0,0,0),('localhost','cpu-3/cpu-system',8,0,0,0),
('localhost','cpu-3/cpu-idle',8,0,0,0),('localhost','cpu-3/cpu-wait',8,0,0,0),
('localhost','cpu-3/cpu-interrupt',8,0,0,0),('localhost','cpu-3/cpu-softirq',8,0,0,0),
('localhost','cpu-3/cpu-steal',8,0,0,0),('localhost','cpu-4/cpu-user',8,0,0,0),
('localhost','cpu-4/cpu-nice',8,0,0,0),('localhost','cpu-4/cpu-system',8,0,0,0),
('localhost','cpu-4/cpu-idle',8,0,0,0),('localhost','cpu-4/cpu-wait',8,0,0,0),
('localhost','cpu-4/cpu-interrupt',8,0,0,0),('localhost','cpu-4/cpu-softirq',8,0,0,0),
('localhost','cpu-4/cpu-steal',8,0,0,0),('localhost','cpu-3/cpu-nice',8,0,0,0),
('localhost','cpu-3/cpu-system',8,0,0,0),('localhost','cpu-3/cpu-idle',8,0,0,0),
('localhost','cpu-3/cpu-wait',8,0,0,0),('localhost','cpu-3/cpu-interrupt',8,0,0,0),
('localhost','cpu-3/cpu-softirq',8,0,0,0),('localhost'
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 0 page no 344 n bits 488 index `unique-metric` of table 
`shm`.`metric` trx id 72D7 lock_mode X
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 344 n bits 504 index `unique-metric` of table 
`shm`.`metric` trx id 72D7 lock_mode X locks gap before rec insert intention waiting
*** WE ROLL BACK TRANSACTION (1)

不应该重试吗？有没有办法验证percona实际重试了10000次查询？

Answer 1

对你的问题没有确切的答案，但是对于任何写入密集的负载（如果你试图插入与该死的Drupal一样的数据），那么就会发生死锁，这对我来说是唯一的解决方案（仍在等待确认这是100％OK解决方案） - 是在galera节点前使用haproxy，并定义要使用的第一个节点（haproxy后端定义），以及其他2个节点用作备份。

这样所有mysql流量都将从客户端流经haproxy到单个galera节点，如果该节点出现故障，将使用其他节点。

希望有帮助...... 的Andrija

Answer 2

在您的回答中，可伸缩性是一个问题，因为我们位于集群中，但是仅使用一个节点确实对资源的使用很差。因此，替代方案是，您可以使用任何负载均衡器，如果您的负载均衡器可以在两个端口（例如3306和3305）上创建2个侦听器；然后说
与3306绑定的lister从应用程序获取所有的写请求，它的后端将有节点1，然后有node2和node3作为备份; 绑定到3305的lister将具有来自application的所有读取请求，其后端将具有所有正常指定的节点。因此，其可扩展性和可读写性都没有有限的可扩展性，在这种情况下，死锁可以减少到非常长的程度。

为什么即使将wsrep_retry_autocommit设置得非常高，我仍然会遇到死锁？

2 个答案: