Percona TokuDB恢复运行超过24小时

时间:2015-08-07 16:22:00

标签: database percona tokudb

背景
昨天,我们遇到了一种情况(至少在我看来)像死锁一样。已经发布了一个Truncate表(针对无争用的表),它的状态是"查询已完成"但从未从流程列表中清除。此外,用户与该模式中不相关的表的连接无法与等待元数据锁定(或类似的,没有获取屏幕截图)"。

第一次尝试终止截断查询,然后挂起的查询导致其状态变为" kill"但从未让它们从进程列表中消失。

此时,即使超级用户也无法登录(以前打开的监视器显示连接为 - 等待元数据锁定。)

尝试了有序关机,但经过一段漫长的时间后,它看起来像是操作系统杀死了-9&#m; mysql。由于一堆内存意外地被固定,重新启动似乎已经完成并且已经完成。

现在,在过去的24小时内,Percona似乎要么重播(非常慢)一些我无法想象的交易,或者处于某种重播循环中。为了让您了解日志消息:

2015-08-07 09:01:42 25894 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2015-08-07 09:01:42 25894 [Note] InnoDB: Memory barrier is not used
2015-08-07 09:01:42 25894 [Note] InnoDB: Compressed tables use zlib 1.2.3
2015-08-07 09:01:42 25894 [Note] InnoDB: Using Linux native AIO
2015-08-07 09:01:42 25894 [Note] InnoDB: Using CPU crc32 instructions
2015-08-07 09:01:42 25894 [Note] InnoDB: Initializing buffer pool, size = 20.0G
2015-08-07 09:01:43 25894 [Note] InnoDB: Completed initialization of buffer pool
2015-08-07 09:01:44 25894 [Note] InnoDB: Highest supported file format is Barracuda.
2015-08-07 09:01:44 25894 [Note] InnoDB: The log sequence numbers 551884808204 and 551884808204 in ibdata files do not match the log sequence number 551884808684 in the ib_logfiles!
2015-08-07 09:01:44 25894 [Note] InnoDB: Database was not shutdown normally!
2015-08-07 09:01:44 25894 [Note] InnoDB: Starting crash recovery.
2015-08-07 09:01:44 25894 [Note] InnoDB: Reading tablespace information from the .ibd files...
2015-08-07 09:01:44 25894 [Note] InnoDB: Restoring possible half-written data pages
2015-08-07 09:01:44 25894 [Note] InnoDB: from the doublewrite buffer...
2015-08-07 09:01:44 25894 [Note] InnoDB: starting tracking changed pages from LSN 551884808684
InnoDB: Transaction 122307881 was in the XA prepared state.
InnoDB: 1 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 0 row operations to undo
InnoDB: Trx id counter is 122333952
InnoDB: Last MySQL binlog file position 0 134745064, file name mysqld-bin.000281
2015-08-07 09:01:44 25894 [Note] InnoDB: 128 rollback segment(s) are active.
InnoDB: Starting in background the rollback of uncommitted transactions
2015-08-07 09:01:44 7f7b96bff700  InnoDB: Rollback of non-prepared transactions completed
2015-08-07 09:01:44 25894 [Note] InnoDB: Waiting for purge to start
2015-08-07 09:01:44 25894 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.25-73.1 started; log sequence number 551884808684
Fri Aug  7 09:01:44 2015 TokuFT recovery starting in env /var/lib/mysql/
Fri Aug  7 09:01:45 2015 TokuFT recovery scanning backward from 957963270
Fri Aug  7 09:01:45 2015 TokuFT recovery bw_end_checkpoint at 957917890 timestamp 1438892691965548 xid 957841798 (bw_newer)
Fri Aug  7 09:01:45 2015 TokuFT recovery bw_begin_checkpoint at 957841798 timestamp 1438892662604852 (bw_between)
Fri Aug  7 09:01:45 2015 TokuFT recovery turning around at begin checkpoint 957841798 time 29360696
Fri Aug  7 09:01:45 2015 TokuFT recovery starts scanning forward to 957963270 from 957841798 left 121472 (fw_between)
Fri Aug  7 09:02:09 2015 TokuFT lsn 957930340 commit xid 405185:0 1909760/293672232 1%
Fri Aug  7 09:02:24 2015 TokuFT lsn 957930340 commit xid 405185:0 3742720/293672232 1%
Fri Aug  7 09:02:39 2015 TokuFT lsn 957930340 commit xid 405185:0 5556224/293672232 2%
Fri Aug  7 09:02:54 2015 TokuFT lsn 957930340 commit xid 405185:0 7327744/293672232 2%
Fri Aug  7 09:03:09 2015 TokuFT lsn 957930340 commit xid 405185:0 8884224/293672232 3%
-- snip --
Fri Aug  7 09:21:09 2015 TokuFT lsn 957930340 commit xid 405185:0 150720512/293672232 51%
Fri Aug  7 09:21:24 2015 TokuFT lsn 957930340 commit xid 405185:0 152825856/293672232 52%
Fri Aug  7 09:21:39 2015 TokuFT lsn 957930340 commit xid 405185:0 154894336/293672232 53%
150807 09:21:42 mysqld_safe Transparent huge pages are already set to: never.
150807 09:21:42 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
2015-08-07 09:21:42 0 [Note] /usr/sbin/mysqld (mysqld 5.6.25-73.1-log) starting as process 29206 ...
2015-08-07 09:21:42 29206 [Note] InnoDB: Using atomics to ref count buffer pool pages
2015-08-07 09:21:42 29206 [Note] InnoDB: The InnoDB memory heap is disabled
2015-08-07 09:21:42 29206 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2015-08-07 09:21:42 29206 [Note] InnoDB: Memory barrier is not used
2015-08-07 09:21:42 29206 [Note] InnoDB: Compressed tables use zlib 1.2.3
2015-08-07 09:21:42 29206 [Note] InnoDB: Using Linux native AIO
2015-08-07 09:21:42 29206 [Note] InnoDB: Using CPU crc32 instructions
2015-08-07 09:21:42 29206 [Note] InnoDB: Initializing buffer pool, size = 20.0G
2015-08-07 09:21:43 29206 [Note] InnoDB: Completed initialization of buffer pool
2015-08-07 09:21:44 29206 [Note] InnoDB: Highest supported file format is Barracuda.
2015-08-07 09:21:44 29206 [Note] InnoDB: The log sequence numbers 551884808204 and 551884808204 in ibdata files do not match the log sequence number 551884808694 in the ib_logfiles!
2015-08-07 09:21:44 29206 [Note] InnoDB: Database was not shutdown normally!
2015-08-07 09:21:44 29206 [Note] InnoDB: Starting crash recovery.
2015-08-07 09:21:44 29206 [Note] InnoDB: Reading tablespace information from the .ibd files...
2015-08-07 09:21:44 29206 [Note] InnoDB: Restoring possible half-written data pages
2015-08-07 09:21:44 29206 [Note] InnoDB: from the doublewrite buffer...
2015-08-07 09:21:44 29206 [Note] InnoDB: starting tracking changed pages from LSN 551884808694
InnoDB: Transaction 122307881 was in the XA prepared state.
InnoDB: 1 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 0 row operations to undo
InnoDB: Trx id counter is 122334464
InnoDB: Last MySQL binlog file position 0 134745064, file name mysqld-bin.000281
2015-08-07 09:21:45 29206 [Note] InnoDB: 128 rollback segment(s) are active.
InnoDB: Starting in background the rollback of uncommitted transactions
2015-08-07 09:21:45 7feb457ff700  InnoDB: Rollback of non-prepared transactions completed
2015-08-07 09:21:45 29206 [Note] InnoDB: Waiting for purge to start
2015-08-07 09:21:45 29206 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.25-73.1 started; log sequence number 551884808694
Fri Aug  7 09:21:45 2015 TokuFT recovery starting in env /var/lib/mysql/
Fri Aug  7 09:21:45 2015 TokuFT recovery scanning backward from 957963270
Fri Aug  7 09:21:45 2015 TokuFT recovery bw_end_checkpoint at 957917890 timestamp 1438892691965548 xid 957841798 (bw_newer)
Fri Aug  7 09:21:46 2015 TokuFT recovery bw_begin_checkpoint at 957841798 timestamp 1438892662604852 (bw_between)
Fri Aug  7 09:21:46 2015 TokuFT recovery turning around at begin checkpoint 957841798 time 29360696
Fri Aug  7 09:21:46 2015 TokuFT recovery starts scanning forward to 957963270 from 957841798 left 121472 (fw_between)
Fri Aug  7 09:22:10 2015 TokuFT lsn 957930340 commit xid 405185:0 2105344/293672232 1%
Fri Aug  7 09:22:25 2015 TokuFT lsn 957930340 commit xid 405185:0 3942400/293672232 1%
Fri Aug  7 09:22:40 2015 TokuFT lsn 957930340 commit xid 405185:0 5748736/293672232 2%
Fri Aug  7 09:22:55 2015 TokuFT lsn 957930340 commit xid 405185:0 7554048/293672232 3%
Fri Aug  7 09:23:10 2015 TokuFT lsn 957930340 commit xid 405185:0 9150464/293672232 3%
-- snip --
Fri Aug  7 09:25:40 2015 TokuFT lsn 957930340 commit xid 405185:0 28329984/293672232 10%
Fri Aug  7 09:25:55 2015 TokuFT lsn 957930340 commit xid 405185:0 30284800/293672232 10%
Fri Aug  7 09:26:10 2015 TokuFT lsn 957930340 commit xid 405185:0 32245760/293672232 11%
Fri Aug  7 09:26:25 2015 TokuFT lsn 957930340 commit xid 405185:0 34166784/293672232 12%
Fri Aug  7 09:26:40 2015 TokuFT lsn 957930340 commit xid 405185:0 36102144/293672232 12%
Fri Aug  7 09:26:55 2015 TokuFT lsn 957930340 commit xid 405185:0 38104064/293672232 13%
Fri Aug  7 09:27:10 2015 TokuFT lsn 957930340 commit xid 405185:0 40033280/293672232 14%
Fri Aug  7 09:27:25 2015 TokuFT lsn 957930340 commit xid 405185:0 41992192/293672232 14%
-- snip --
Fri Aug  7 09:31:10 2015 TokuFT lsn 957930340 commit xid 405185:0 70963200/293672232 24%
Fri Aug  7 09:31:25 2015 TokuFT lsn 957930340 commit xid 405185:0 72942592/293672232 25%
Fri Aug  7 09:31:40 2015 TokuFT lsn 957930340 commit xid 405185:0 74896384/293672232 26%
Fri Aug  7 09:31:55 2015 TokuFT lsn 957930340 commit xid 405185:0 76867584/293672232 26%
Fri Aug  7 09:32:10 2015 TokuFT lsn 957930340 commit xid 405185:0 78902272/293672232 27%
Fri Aug  7 09:32:25 2015 TokuFT lsn 957930340 commit xid 405185:0 80914432/293672232 28%
Fri Aug  7 09:32:40 2015 TokuFT lsn 957930340 commit xid 405185:0 82846720/293672232 28%
Fri Aug  7 09:32:55 2015 TokuFT lsn 957930340 commit xid 405185:0 84836352/293672232 29%
Fri Aug  7 09:33:10 2015 TokuFT lsn 957930340 commit xid 405185:0 86836224/293672232 30%
--- snip --
Fri Aug  7 09:36:25 2015 TokuFT lsn 957930340 commit xid 405185:0 113692672/293672232 39%
Fri Aug  7 09:36:40 2015 TokuFT lsn 957930340 commit xid 405185:0 115834880/293672232 39%
Fri Aug  7 09:36:55 2015 TokuFT lsn 957930340 commit xid 405185:0 118014976/293672232 40%
Fri Aug  7 09:37:10 2015 TokuFT lsn 957930340 commit xid 405185:0 120132608/293672232 41%
Fri Aug  7 09:37:25 2015 TokuFT lsn 957930340 commit xid 405185:0 122251264/293672232 42%
Fri Aug  7 09:37:40 2015 TokuFT lsn 957930340 commit xid 405185:0 124372992/293672232 42%
-- snip --
Fri Aug  7 09:38:40 2015 TokuFT lsn 957930340 commit xid 405185:0 133026816/293672232 45%
Fri Aug  7 09:38:55 2015 TokuFT lsn 957930340 commit xid 405185:0 135154688/293672232 46%
Fri Aug  7 09:39:10 2015 TokuFT lsn 957930340 commit xid 405185:0 137283584/293672232 47%
Fri Aug  7 09:39:25 2015 TokuFT lsn 957930340 commit xid 405185:0 139439104/293672232 47%
-- snip --
Fri Aug  7 09:41:10 2015 TokuFT lsn 957930340 commit xid 405185:0 154284032/293672232 53%
Fri Aug  7 09:41:25 2015 TokuFT lsn 957930340 commit xid 405185:0 156420096/293672232 53%
Fri Aug  7 09:41:40 2015 TokuFT lsn 957930340 commit xid 405185:0 158516224/293672232 54%
150807 09:41:42 mysqld_safe Transparent huge pages are already set to: never.
150807 09:41:42 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql

因此,虽然日志序列和TokuFT提交的xid数字似乎保持静态,但扫描的检查点和%完成(总是~49-55%)从运行变为运行。

目前还不清楚这是否以某种方式展开了一些巨大的事务(用户正在做一些大规模的表更新,DDL更改等等,而且我确信autocommit = 0)或者这是一个&#循环39;永远不会结束。

有关如何进一步排查或排除故障的任何提示?

感谢。

1 个答案:

答案 0 :(得分:1)

查看所有错误位置的案例。 SystemD非常有帮助#34;超时启动服务器并启动MySQL的完全重启。

如果你有一个很长的xlog要重播,请记住一件好事 - 暂时禁用systemd的超时(或者从默认的10分钟开始超时)。