Question

当我尝试在更新查询下运行时，大约需要40个小时才能完成。所以我添加了一个时间限制（更新查询有时间限制）。但是仍然需要几乎相同的时间才能完成。有什么办法可以加速这次更新吗？

编辑：我真正想做的只是在某些特定日期之间获取日志，并在此记录上运行此更新查询。

Dim WTotal As Integer
WTotal = InputBox("Enter the amount of Wash")
Dim Startpoint As Range
Dim totalamount As Integer

Sheets("Sheet2").Select
Set Startpoint = ActiveSheet.Cells.Find(What:="Wash")
Startpoint.Offset(1, 0).Select
Range(Selection, Selection.End(xlDown)).Select
totalamount = Selection.Count

MsgBox "totalamount = " & totalamount

更新前的表格

create table user
(userid varchar(30));

create table logs
( log_time timestamp,
log_detail varchar(100),   
userid varchar(30));

insert into user values('user1');
insert into user values('user2');
insert into user values('user3');
insert into user values('');

insert into logs values('no user mentioned','user3');
insert into logs values('inserted by user2','user2');
insert into logs values('inserted by user3',null);

更新查询

log_time |        log_detail | userid |
 ..      |-------------------|--------|
 ..      |   no user mention |  user3 |
 ..      | inserted by user2 |  user2 |
 ..      | inserted by user3 | (null) |

更新有时间限制的查询

update logs join user
set logs.userid=user.userid
where logs.log_detail LIKE concat("%",user.userID,"%") and user.userID != "";

更新后的表格

update logs join user
set logs.userid = IF (logs.log_time between '2015-08-11 00:39:41' AND '2015-08-01 17:39:44', user.userID, null)
where logs.log_detail LIKE concat("%",user.userID,"%") and user.userID != "";

编辑：原始问题Sql update statement with variable。

Answer 1

首先，时间限制的正确位置在where子句中，而不是if：

update logs l left join
       user u
       on l.log_detail LIKE concat("%", u.userID)
    set l.userid = u.userID
where l.log_time between '2015-08-11 00:39:41' AND '2015-08-01 17:39:44';

如果您想将其他人设置为NULL，请执行以下操作：

update logs l
     set l.userid = NULL
     where l.log_time not between '2015-08-11 00:39:41' AND '2015-08-01 17:39:44';

但是，如果你真的希望这个很快，你需要为连接使用索引。这可能会使用users(userid)上的索引：

update logs l left join
       user u
       on cast(substring_index(l.log_detail, ' ', -1) as signed) = u.userID
    set l.userid = u.userID
where l.log_time between '2015-08-11 00:39:41' AND '2015-08-01 17:39:44';

查看等效explain上的select。 cast()与UserId相同的类型非常重要。

Answer 2

日志表每月很容易填满大量的数据行，甚至最好的索引也无济于事，特别是在LIKE运算符的情况下。您的log_detail列长度为100个字符，搜索查询为CONCAT("%",user.userID,"%")。在SQL命令中使用函数可能会减慢速度，因为函数正在执行额外的计算。如果您的用户ID是John，%John%，那么您要搜索的内容就是。因此，您的查询将扫描该表中的每一行，因为索引将是半无用的。如果您没有第一个%，那么查询将能够有效地利用其索引。实际上，您的查询会执行INDEX SCAN而不是INDEX SEEK。

有关这些概念的更多信息，请参阅：

Index Seek VS Index Scan

Query tuning a LIKE operator

好的，你能做些什么呢？两种策略。

选项1是限制您要搜索的行数通过。你有正确的想法使用时间限制来减少要搜索的行数。我建议的是把时间限制作为WHERE子句中的第一个表达式。大多数数据库首先执行第一个表达式所以当第二个表达式开始，它只会扫描返回的行第一个表达。
```
update logs join user 
set logs.userid=user.userid 
where logs.log_time between '2015-08-01' and '2015-08-11' 
and logs.log_detail LIKE concat('%',user.userID,'%')
```
选项2取决于您对数据库的控制。如果你有总数控制（你有时间和金钱，MySQL有一个叫做的功能 Auto-Sharding。这在MySQL Cluster和MySQL中可用布。作为链接，我不会详细介绍这些产品下面提供的可以比我更好地解释自己总结一下，但Sharding背后的想法是将行拆分为水平表，可以这么说。它背后的想法就是你不是通过长数据库表搜索，而是跨越几个姐妹桌同时。搜索10个表格 1000万行比搜索1表100更快百万行。

Database Sharding - Wikipedia

MySQL Cluster

MySQL Fabric

Answer 3

您可以添加一个名为log_detail_reverse的新列，其中可以设置触发器，以便在插入新行时，还可以使用MySQL函数{{}以反向字符顺序插入log_detail列。 1}}。当您执行更新查询时，还会反转用户ID搜索。最终结果是，您将reverse转换为INDEX SCAN，这将更快。

INDEX SEEK

MySQL Trigger

update logs join user set logs.userid=user.userid where logs.log_time between '2015-08-01' and '2015-08-11' and logs.log_detail_reverse LIKE concat(reverse(user.userID), '%')可能类似于：

Trigger

Answer 4

加速更新的一个方面是不更新不需要更新的记录。您只想更新用户与日志文本中提到的用户不匹配的特定时间范围内的记录。因此限制要在where子句中更新的记录。

update logs 
set userid = substring_index(log_detail, ' ', -1)
where log_time between '2015-08-11 00:39:41' AND '2015-08-01 17:39:44'
and not userid <=> substring_index(log_detail, ' ', -1);

如何使用like命令提高查询速度

4 个答案: