应用错误收集

我们在MySQL 5.7.23和5.7.24上屡次看到此问题。复制由于错误而冻结，我无法使用“停止从站；启动从站；”

手动重新启动它

MySQL在Google计算引擎上的VM的Debian 9上运行，并且所有软件包都是最新的。 VM具有4个CPU / 26GB RAM。在MySQL副本上，我们将并行复制过程，ROW binlog格式和LOGICAL_CLOCK用作slave-parallel-type

我们的问题的场景：

只读副本上的复制停止，错误为1205。
错误文本：Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 7 failed executing transaction 'ANONYMOUS' at master log mysql-bin.00xxxx, end_log_pos xxxxxxxxx. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
在bin日志中，我看到一些正常的UPDATE命令-白天我们有很多。
检查performance_schema.replication_applier_status_by_worker会显示以下错误："Worker 1 failed executing transaction 'ANONYMOUS' at master log mysql-bin.00xxxx, end_log_pos xxxxxxxxx; Lock wait timeout exceeded; try restarting transaction"
我启动命令“停止从属；”从mysql命令行工具获取，但命令被冻结-进程列表显示进程| 56327 | root | localhost | NULL | Query | 61716 | Killing slave | stop slave |无限期运行
手动重启实例无效。实例已冻结，我无法对其进行加密，必须强制从Google GCE Web gui重新启动。
在error.log中，我可以看到错误消息序列Worker 7 failed executing transaction 'ANONYMOUS' at master log mysql-bin.00xxxx, end_log_pos xxxxxxxx; Could not execute Update_rows event on table xxxx.xxxx; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mysql-bin.00xxxx, end_log_pos xxxxxxxxx, Error_code: 1205
序列以错误消息结尾：worker thread retried transaction 10 time(s) in vain, giving up. Consider raising the value of the slave_transaction_retries variable. Error_code: 1205

我试图设置较高的变量slave_transaction_retries（至30），这降低了“冻结案例”的数量，但问题仍然存在。如果复制停止，则无法从mysql命令行工具手动重新启动它。

对于5.7.22或更早版本的冻结复制，我们没有这些问题。尽管由于一天中发生的大量更新，我们有时会在复制中出现1205错误，但是从mysql命令行工具手动重启复制始终可以正常工作。

在5.7.24上，情况似乎要好一些，它在复制中进行了许多修复。在24日，我们看到此问题的情况要少得多，但仍然存在。

我可以通过某些参数影响此行为吗？
您建议如何检查此问题是否再次发生？
是否可以在不重新启动MySQL的情况下强制重新启动冻结的复制？

非常感谢您的任何想法或帮助。

在MySQL 5.7.23和5.7.24上的冻结复制

0 个答案: