我一直认为,不存在是不是存在的方式,而不是使用不处于条件状态。但是,我对我一直在使用的查询进行比较,我注意到Not In条件的执行实际上似乎更快。任何有关为什么会出现这种情况的见解,或者如果我在此之前做出一个可怕的假设,我将不胜感激!
QUERY 1:
SELECT DISTINCT
a.SFAccountID, a.SLXID, a.Name FROM [dbo].[Salesforce_Accounts] a WITH(NOLOCK)
JOIN _SLX_AccountChannel b WITH(NOLOCK)
ON a.SLXID = b.ACCOUNTID
JOIN [dbo].[Salesforce_Contacts] c WITH(NOLOCK)
ON a.SFAccountID = c.SFAccountID
WHERE b.STATUS IN ('Active','Customer', 'Current')
AND c.Primary__C = 0
AND NOT EXISTS
(
SELECT 1 FROM [dbo].[Salesforce_Contacts] c2 WITH(NOLOCK)
WHERE a.SFAccountID = c2.SFAccountID
AND c2.Primary__c = 1
);
QUERY 2:
SELECT
DISTINCT
a.SFAccountID FROM [dbo].[Salesforce_Accounts] a WITH(NOLOCK)
JOIN _SLX_AccountChannel b WITH(NOLOCK)
ON a.SLXID = b.ACCOUNTID
JOIN [dbo].[Salesforce_Contacts] c WITH(NOLOCK)
ON a.SFAccountID = c.SFAccountID
WHERE b.STATUS IN ('Active','Customer', 'Current')
AND c.Primary__C = 0
AND a.SFAccountID NOT IN (SELECT SFAccountID FROM [dbo].[Salesforce_Contacts] WHERE Primary__c = 1 AND SFAccountID IS NOT NULL);
查询1的实际执行计划:
查询2的实际执行计划:
时间/ IO统计:
查询#1(使用不存在):
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 532 ms, elapsed time = 533 ms.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Salesforce_Contacts'. Scan count 2, logical reads 3078, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'INFORMATION'. Scan count 1, logical reads 691, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ACCOUNT'. Scan count 4, logical reads 567, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Salesforce_Accounts'. Scan count 1, logical reads 680, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 250 ms, elapsed time = 271 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
查询#2(使用Not In):
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 500 ms, elapsed time = 500 ms.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Salesforce_Contacts'. Scan count 2, logical reads 3079, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'INFORMATION'. Scan count 1, logical reads 691, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ACCOUNT'. Scan count 4, logical reads 567, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Salesforce_Accounts'. Scan count 1, logical reads 680, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 157 ms, elapsed time = 166 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
答案 0 :(得分:1)
尝试
SELECT DISTINCT a.SFAccountID, a.SLXID, a.Name
FROM [dbo].[Salesforce_Accounts] a WITH(NOLOCK)
JOIN _SLX_AccountChannel b WITH(NOLOCK)
ON a.SLXID = b.ACCOUNTID
AND b.STATUS IN ('Active','Customer', 'Current')
JOIN [dbo].[Salesforce_Contacts] c WITH(NOLOCK)
ON a.SFAccountID = c.SFAccountID
AND c.Primary__C = 0
LEFT JOIN [dbo].[Salesforce_Contacts] c2 WITH(NOLOCK)
on c2.SFAccountID = a.SFAccountID
AND c2.Primary__c = 1
WHERE c2.SFAccountID is null
答案 1 :(得分:1)
据我了解,一个不能以与两个嵌套的指令相同的方式工作。
所以,假设你有两个表:table(1000条记录)和tabla(2000条记录),
select * from table where table.field not in (select field from tabla)
就像在做
for (int i = 0; i < 1000; i++) {
for (int j = 0; j < 2000; j++) {
}
}
即1000 * 2000 = 200万次操作。
与tabla.field的左连接是空技巧,据我所知,只做2000次操作
使用左连接。
答案 2 :(得分:1)
我认为缺少的索引会导致EXISTS()
和IN
操作的差异。
虽然问题没有要求更好的查询,但对我来说,我会尽量避免像这样的区别
SELECT
a.SFAccountID, a.SLXID, a.Name
FROM
[dbo].[Salesforce_Accounts] a WITH(NOLOCK)
CROSS APPLY
(
SELECT SFAccountID
FROM [dbo].[Salesforce_Contacts] WITH(NOLOCK)
WHERE SFAccountID = a.SFAccountID
GROUP BY SFAccountID
HAVING MAX(Primary__C + 0) = 0 -- Assume Primary__C is a bit value
) b
WHERE
-- Actually it is the filtering condition for account channel
EXISTS
(
SELECT * FROM _SLX_AccountChannel WITH(NOLOCK)
WHERE ACCOUNTID = a.SLXID AND STATUS IN ('Active','Customer', 'Current')
)
答案 3 :(得分:1)
问题是:&#34;为什么NOT IN
似乎比NOT EXISTS
&#34;更快。
我的回答是:它似乎只是更快,但它是一样的。 (在这种情况下)
您是否实际测量了两个查询的时间并确认存在差异?
或者您刚看了执行计划?
据我了解,您在屏幕截图中看到的查询成本(53%vs 47%)是:
在这种特殊情况下,查询优化器似乎为两个查询生成了几乎相同的计划。对于计划中的某些操作员,计划很可能(略微)计算行数,但实际性能是相同的,因为计划形状是相同的。如果估计的行数不同,则会导致您看到的估算查询成本不同。
要查看计划的差异(如果有的话),我会使用像SQL Sentry Plan Explorer这样的工具。它显示了更多详细信息,您可以更轻松地比较查询的所有方面。
将查询重写得更快是一个不同的问题,我不会尝试在此处回答。
答案 4 :(得分:0)
这假设您正在尝试查找没有主要联系人的帐户,并且只能有一个主要联系人
SELECT a.SFAccountID, a.SLXID, a.Name
FROM [dbo].[Salesforce_Accounts] a
LEFT JOIN [dbo].[Salesforce_Contacts] c ON a.SFAccountID = c.SFAccountID AND c.Primary__C = 1
WHERE
EXISTS (SELECT *
FROM SLX_AccountChannel b
WHERE b.ACCOUNTID = a.SLXID
AND b.STATUS IN ( 'Active', 'Customer', 'Current' ))
AND c.SFContactID IS NULL
如果您想要拥有联系人但没有主要联系人的帐户,您可以使用
SELECT
a.SFAccountID ,
a.SLXID ,
a.Name
FROM
[dbo].[Salesforce_Accounts] a
WHERE
a.SFAccountID IN (SELECT SFAccountID
FROM [Salesforce_Contacts]
GROUP BY SFAccountID
HAVING SUM(CAST(Primary__c AS INT) = 0))
AND a.SLXID IN (SELECT ACCOUNTID
FROM _SLX_AccountChannel
WHERE [STATUS] IN ( 'Active', 'Customer', 'Current' ))
答案 5 :(得分:0)
您可以不多次点击/加入Salesforce_Contacts
。这更紧凑,更快:
SELECT a.SFAccountID, a.SLXID, a.Name
FROM [dbo].[Salesforce_Accounts] a WITH(NOLOCK)
JOIN _SLX_AccountChannel b WITH(NOLOCK)
ON a.SLXID = b.ACCOUNTID
JOIN [dbo].[Salesforce_Contacts] c WITH(NOLOCK)
ON a.SFAccountID = c.SFAccountID
WHERE b.STATUS IN ('Active','Customer', 'Current')
GROUP BY a.SFAccountID, a.SLXID, a.Name
HAVING MAX(c.Primary__C) = 0
IN
和EXISTS
之间的差异可以忽略不计。