长期运行mysql“清理”事务

时间:2015-08-19 14:03:16

标签: mysql amazon-web-services transactions locking amazon-rds

我一直在尝试调试MySQL(AWS RDS)v5.6.19a中的“锁定等待超时超时”错误,当我尝试使用主ID选择行进行更新时偶尔会抛出该错误,即:

SELECT primary_id FROM tbl_widgets WHERE primary_id = 5 FOR UPDATE

经过几个小时的调试后,我已经排除了我的应用程序的另一部分“直接”锁定同一行(这是明显的罪魁祸首)。因此我开始深入挖掘mysql锁定的兔子洞,并注意到抛出的“锁定等待超时超时”错误与以下信息提供的以下相关性:

SHOW ENGINE INNODB STATUS;

清理状态下似乎存在长时间运行的TRANSACTION,该状态锁定缓慢增加的行数最多约10分钟,以下是来自10手动INNODB的此事务的相关行STATUS查询:

2015-08-19 13:29:04
---TRANSACTION 25861246681, ACTIVE 158 sec
10 lock struct(s), heap size 1184, 21 row lock(s), undo log entries 20
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7146839061 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

2015-08-19 13:29:42
---TRANSACTION 25861246681, ACTIVE 196 sec
13 lock struct(s), heap size 2936, 28 row lock(s), undo log entries 27
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147149416 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

2015-08-19 13:30:10
---TRANSACTION 25861246681, ACTIVE 224 sec
13 lock struct(s), heap size 2936, 31 row lock(s), undo log entries 30
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147321023 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

2015-08-19 13:30:41
---TRANSACTION 25861246681, ACTIVE 255 sec
13 lock struct(s), heap size 2936, 35 row lock(s), undo log entries 34
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147511090 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

2015-08-19 13:31:12
---TRANSACTION 25861246681, ACTIVE 286 sec
15 lock struct(s), heap size 2936, 38 row lock(s), undo log entries 37
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147604774 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

2015-08-19 13:31:30
---TRANSACTION 25861246681, ACTIVE 304 sec
21 lock struct(s), heap size 2936, 42 row lock(s), undo log entries 39
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147789789 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

2015-08-19 13:31:57
---TRANSACTION 25861246681, ACTIVE 331 sec
21 lock struct(s), heap size 2936, 46 row lock(s), undo log entries 43
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147837536 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

2015-08-19 13:32:28
---TRANSACTION 25861246681, ACTIVE 362 sec
22 lock struct(s), heap size 2936, 51 row lock(s), undo log entries 48
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7147905807 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

2015-08-19 13:33:16
---TRANSACTION 25861246681, ACTIVE 410 sec
23 lock struct(s), heap size 2936, 58 row lock(s), undo log entries 55
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7148317478 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

2015-08-19 13:33:49
---TRANSACTION 25861246681, ACTIVE 443 sec
24 lock struct(s), heap size 2936, 64 row lock(s), undo log entries 61
MySQL thread id 5110120, OS thread handle 0x2ba082506700, query id 7148471519 10.0.1.154 mfuser cleaning up
Trx read view will not see trx with id >= 25861246682, sees < 25861246682

我遇到了以下博客文章(http://databaseblog.myname.nl/2014/10/when-your-query-is-blocked-but-there-is_26.html),它提供了一个潜在的解决方案,可以帮助确定这个长时间运行的交易中发生了什么,特别是设置:

set GLOBAL innodb_status_output_locks=ON;

不幸的是,由于权限受限,无法在RDS上执行此操作。

我很乐意请求一些调试帮助,说明如何解决清理事务中发生的事情,以及如何避免这一切。

编辑添加:MySQL实例的平均CPU使用率为20%

1 个答案:

答案 0 :(得分:1)

在我的情况下,我杀死JVM后我的“清理”锁就消失了。我正在运行我的调试器。显然它们是我在清理事务之前中断的早期调试运行的残余。

这可能对您没有帮助,但在这种情况下,这里有一些调试建议。

  1. 您确实有一条信息,即锁的数量。使用断点,您可以在各个位置暂停应用程序,以尝试精确查明计数何时上升。 (或者,只有在日志中看到某些错误后才会上升;或者只有在某些用户操作之后才会出现错误。)

  2. 如果您不能使用断点,那么您还有另一个工具,这是一个select for update语句,在发生锁定后会阻塞。您可以将它洒在代码周围,可能还有其他日志记录,以确定阻止开始的位置。

  3. 考虑针对本地安装的MySQL数据库临时调试应用程序。这可以安装在本地服务器上,也可以安装在开发计算机上。这可能是设置的麻烦,但可以有许多其他好处(例如db脚本的测试平台;离线时在笔记本电脑上工作的能力。)

  4. 所有这些都假定锁是由您自己的代码引起的,而不是由其他工作引起的。 (在您的日志中,清理用户是“mfuser”。)这使您可以按需重现问题。