我有一个使用条件检查NOT IN的插入。 NOT IN子查询中有大约230k行。
INSERT INTO Validate.ItemError (ItemId, ErrorId, DateCreated)
(
SELECT ItemId, 10, GetUTCDate()
FROM Validate.Item
INNER JOIN Refresh.Company
ON Validate.Item.IMCompanyId = Refresh.Company.IMCompanyId
WHERE Refresh.Company.CompanyId = 14
AND
(
IMAccountId IS NULL OR NOT IMAccountId IN
(
SELECT RA.IMAccountId
FROM Refresh.Account RA
INNER JOIN Refresh.BalancePool BP
ON RA.BalancePoolId = BP.BalancePoolId
WHERE BP.CompanyId = 14
)
)
)
当我按原样运行时,大约需要30多分钟(哎呀!)。 Validate.Item表中的值的数量可以是从150行到超过200k的任何值,因此您可以看到这可能是一种痛苦。
表格中的所有相关字段都有索引,没有一个是过分的。
我的第一个想法是将它分成几部分,然后将它放入WHILE循环中:
DECLARE @StartId int, @EndId int, @MaxId int
SELECT @MaxId = MAX(AccountId) FROM Refresh.Account
SET @StartId = 1
SET @EndId = 1000
WHILE (@StartId < @MaxId)
BEGIN
INSERT INTO Validate.ItemError (ItemId, ErrorId, DateCreated)
(
SELECT ItemId, 10, GetUTCDate()
FROM Validate.Item
INNER JOIN Refresh.Company
ON Validate.Item.IMCompanyId = Refresh.Company.IMCompanyId
WHERE Refresh.Company.CompanyId = 14
AND
(
IMAccountId IS NULL
OR NOT IMAccountId IN
(
SELECT RA.IMAccountId
FROM Refresh.Account RA
INNER JOIN Refresh.BalancePool BP
ON RA.BalancePoolId = BP.BalancePoolId
WHERE BP.CompanyId = 14
AND RA.AccountId BETWEEN @StartId AND @EndId
)
)
)
SET @StartId = @StartId + 1000
SET @EndId = @EndId + 1000
END
这样做可以让我每次循环约一分钟的时间;乘以230倍,我们有一个更荒谬的数字。
请告诉我你们有更好的想法如何优化它。没有这一个查询,整个过程只需要8秒;它只是Refresh.Account表的绝对大小,它会把所有东西都抛到一片混乱之中。
TIA!
武神
答案 0 :(得分:2)
摆脱OR
条件。
它添加了一个fullscan并阻止优化器使用它将使用的ANTI JOIN
。
此查询返回相同的内容:
SELECT ItemId, 10, GetUTCDate()
FROM Validate.Item
INNER JOIN
Refresh.Company
ON Validate.Item.IMCompanyId = Refresh.Company.IMCompanyId
WHERE Refresh.Company.CompanyId = 14
AND NOT EXISTS
(
SELECT RA.IMAccountId
FROM Refresh.Account RA
INNER JOIN
Refresh.BalancePool BP
ON RA.BalancePoolId = BP.BalancePoolId
WHERE BP.CompanyId = 14
AND RA.IMAccounID = Validate.Item.IMAccountId
)
答案 1 :(得分:1)
改为使用NOT EXISTS:
...OR NOT EXISTS (SELECT 1 FROM
Refresh.Account RA INNER JOIN Refresh.BalancePool BP
ON RA.BalancePoolId = BP.BalancePoolId WHERE BP.CompanyId = 14 AND RA.IMAccountId = xxx.IMAccountId)))
EXISTS后面的子查询只返回满足条件的第一条记录。 (请记住将xxx替换为右表的别名)
答案 2 :(得分:1)
您可以只是对相关表进行左连接并检查空键,而不是“不在”中吗?不确定查询是否100%正确:
INSERT INTO Validate.ItemError (ItemId, ErrorId, DateCreated)
SELECT ItemId, 10, GetUTCDate()
FROM Validate.Item
INNER JOIN Refresh.Company ON Validate.Item.IMCompanyId = Refresh.Company.IMCompanyId
LEFT JOIN Refresh.Account
INNER JOIN Refresh.BalancePool BP ON BP.BalancePoolId = RA.BalancePoolId
ON Refresh.Account.IMAccountId = Validate.Item.IMAccountId
WHERE Refresh.Company.CompanyId = 14
AND Validate.Item.IMAccountId IS NULL OR Refresh.Account.IMAccountId IS NULL
答案 3 :(得分:0)
在这里使用NOT EXISTS
有帮助吗?
(SELECT ItemId, 10, GetUTCDate()
FROM Validate.Item INNER JOIN Refresh.Company ON
Validate.Item.IMCompanyId = Refresh.Company.IMCompanyId
WHERE Refresh.Company.CompanyId = 14
AND (IMAccountId IS NULL OR NOT EXISTS (SELECT TOP 1 RA.IMAccountId FROM
Refresh.Account RA INNER JOIN Refresh.BalancePool BP
ON RA.BalancePoolId = BP.BalancePoolId WHERE BP.CompanyId = 14 AND
RA.IMAcccountID = Validate.Item.IMAccountId)))
我不确定,如果查询是正确的。
但是,我在子查询中使用NOT EXISTS
和TOP 1
此外,子查询通过添加额外的AND RA.IMAcccountID = Validate.Item.IMAccountId
来限制记录。
NOT IN ...
)不应该存在。< / p>