Question

我需要运行一个MySQL脚本，根据我的基准测试，应该花费超过14个小时来运行。该脚本正在更新332715行表中的每一行：

UPDATE gene_set SET attribute_fk = (
    SELECT id FROM attribute WHERE
        gene_set.name_from_dataset <=> attribute.name_from_dataset AND
        gene_set.id_from_dataset <=> attribute.id_from_dataset AND
        gene_set.description_from_dataset <=> attribute.description_from_dataset AND
        gene_set.url_from_dataset <=> attribute.url_from_dataset AND
        gene_set.name_from_naming_authority <=> attribute.name_from_naming_authority AND
        gene_set.id_from_naming_authority <=> attribute.id_from_naming_authority AND
        gene_set.description_from_naming_authority <=> attribute.description_from_naming_authority AND
        gene_set.url_from_naming_authority <=> attribute.url_from_naming_authority AND
        gene_set.attribute_type_fk <=> attribute.attribute_type_fk AND
        gene_set.naming_authority_fk <=> attribute.naming_authority_fk
    );

（该脚本很不幸;我需要将所有数据从gene_set传输到attribute，但首先我必须正确设置外键以指向attribute）。

我无法使用此命令成功运行它：

nohup mysql -h [host] -u [user] -p [database] < my_script.sql

例如，昨晚，它运行了10个多小时，但ssh连接断了：

Write failed: Broken pipe

有没有办法以更好地确保它完成的方式运行此脚本？我真的不在乎需要多长时间（1天vs 2天并不重要）只要我知道它会完成。

Answer 1

最快捷的方法可能是在screen或tmux会话中运行。

Answer 2

扩展我的评论，你的350k记录更新声明表现不佳。这是因为您基于子查询的结果进行设置，而不是作为一组进行更新。因此，您为每一行运行一次语句。更新如下：

UPDATE gene_set g JOIN attribute_fk a ON < all where clauses > SET g.attribute_fk = a.id.

这本身并没有回答你的问题，但我很想知道它的运行速度有多快。

Answer 3

如果你有对服务器的ssh访问权限，你可以将其复制并使用以下行在那里运行：

#copy over to tmp dir
scp my_script.sql user@remoteHost:/tmp/
#execute script on remote host
ssh -t user@remoteHost "nohup mysql \
    -h localhost -u [user] -p [database] < /tmp/my_script.sql &"

Answer 4

也许您可以尝试使用频繁提交进行300k更新，而不是一次大量更新。做任何失败的事情都将保持已经应用的更改。

使用一些dimacic sql你可以一次性获取所有行，然后将文件复制到你的服务器......

Answer 5

以下是我过去在远程服务器中运行整体更改查询的过程，这种查询需要花费很长时间：

mysql -h [host] -u [user] -p [database] < my_script.sql > result.log  2>&1 &

这样您就不需要等待，因为它会在自己的时间内完成，您可以自定义并在my_script.sql的开头和结尾添加select now（），以了解它花了多长时间如果你感兴趣的话。

如果适用，还需要考虑的事项

为什么这个查询需要这么长时间，我们可以改进吗（例如：禁用密钥检查..，离线准备数据并使用临时表进行更新..
我们可以打破查询以批量运行
对DB的其余部分有什么影响等等。

MySQL - 如何通过SSH连接运行长（> 14小时）作业？

5 个答案: