不存在与不存在:效率

时间:2015-07-09 23:01:47

标签: sql sql-server tsql exists sql-execution-plan

我一直认为,不存在是不是存在的方式,而不是使用不处于条件状态。但是,我对我一直在使用的查询进行比较,我注意到Not In条件的执行实际上似乎更快。任何有关为什么会出现这种情况的见解,或者如果我在此之前做出一个可怕的假设,我将不胜感激!

QUERY 1:

SELECT DISTINCT 
a.SFAccountID, a.SLXID, a.Name FROM [dbo].[Salesforce_Accounts] a WITH(NOLOCK)
JOIN  _SLX_AccountChannel b WITH(NOLOCK)
ON a.SLXID = b.ACCOUNTID
JOIN [dbo].[Salesforce_Contacts] c WITH(NOLOCK)
ON a.SFAccountID = c.SFAccountID
WHERE b.STATUS IN ('Active','Customer', 'Current')
AND c.Primary__C = 0
AND NOT EXISTS
(
SELECT 1 FROM [dbo].[Salesforce_Contacts] c2 WITH(NOLOCK)
WHERE a.SFAccountID = c2.SFAccountID
AND c2.Primary__c = 1
);

QUERY 2:

SELECT   
DISTINCT
a.SFAccountID FROM [dbo].[Salesforce_Accounts] a WITH(NOLOCK)
JOIN  _SLX_AccountChannel b WITH(NOLOCK)
ON a.SLXID = b.ACCOUNTID
JOIN [dbo].[Salesforce_Contacts] c WITH(NOLOCK) 
ON a.SFAccountID = c.SFAccountID
WHERE b.STATUS IN ('Active','Customer', 'Current')
AND c.Primary__C = 0
AND a.SFAccountID NOT IN (SELECT SFAccountID FROM [dbo].[Salesforce_Contacts] WHERE Primary__c = 1 AND SFAccountID IS NOT NULL);

查询1的实际执行计划: Execution plan 1

查询2的实际执行计划:Execution plan 2

时间/ IO统计:

查询#1(使用不存在):

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 532 ms, elapsed time = 533 ms.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Salesforce_Contacts'. Scan count 2, logical reads 3078, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'INFORMATION'. Scan count 1, logical reads 691, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ACCOUNT'. Scan count 4, logical reads 567, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Salesforce_Accounts'. Scan count 1, logical reads 680, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 250 ms,  elapsed time = 271 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

查询#2(使用Not In):

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
SQL Server parse and compile time: 
   CPU time = 500 ms, elapsed time = 500 ms.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Salesforce_Contacts'. Scan count 2, logical reads 3079, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'INFORMATION'. Scan count 1, logical reads 691, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ACCOUNT'. Scan count 4, logical reads 567, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Salesforce_Accounts'. Scan count 1, logical reads 680, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 157 ms,  elapsed time = 166 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

6 个答案:

答案 0 :(得分:1)

尝试

SELECT DISTINCT a.SFAccountID, a.SLXID, a.Name 
  FROM [dbo].[Salesforce_Accounts] a WITH(NOLOCK)
  JOIN _SLX_AccountChannel b WITH(NOLOCK)
    ON a.SLXID = b.ACCOUNTID
   AND b.STATUS IN ('Active','Customer', 'Current')
  JOIN [dbo].[Salesforce_Contacts] c WITH(NOLOCK)
    ON a.SFAccountID = c.SFAccountID 
   AND c.Primary__C = 0
  LEFT JOIN [dbo].[Salesforce_Contacts] c2 WITH(NOLOCK) 
    on c2.SFAccountID = a.SFAccountID
   AND c2.Primary__c = 1
 WHERE c2.SFAccountID is null 

答案 1 :(得分:1)

据我了解,一个不能以与两个嵌套的指令相同的方式工作。

所以,假设你有两个表:table(1000条记录)和tabla(2000条记录),

select * from table where table.field not in (select field from tabla)

就像在做

for (int i = 0;  i < 1000; i++) {
   for (int j = 0;  j < 2000; j++) {
   }
}

即1000 * 2000 = 200万次操作。

与tabla.field的左连接是空技巧,据我所知,只做2000次操作

使用左连接。

答案 2 :(得分:1)

我认为缺少的索引会导致EXISTS()IN操作的差异。

虽然问题没有要求更好的查询,但对我来说,我会尽量避免像这样的区别

SELECT
    a.SFAccountID, a.SLXID, a.Name 
FROM 
    [dbo].[Salesforce_Accounts] a WITH(NOLOCK)
    CROSS APPLY 
    (
        SELECT SFAccountID 
        FROM [dbo].[Salesforce_Contacts] WITH(NOLOCK) 
        WHERE SFAccountID  = a.SFAccountID 
        GROUP BY SFAccountID
        HAVING MAX(Primary__C + 0) = 0 -- Assume Primary__C is a bit value
    ) b
WHERE
    -- Actually it is the filtering condition for account channel
    EXISTS
    (
        SELECT * FROM _SLX_AccountChannel WITH(NOLOCK) 
        WHERE ACCOUNTID = a.SLXID AND STATUS IN ('Active','Customer', 'Current')
    )

答案 3 :(得分:1)

问题是:&#34;为什么NOT IN似乎比NOT EXISTS&#34;更快。

我的回答是:它似乎只是更快,但它是一样的。 (在这种情况下)

您是否实际测量了两个查询的时间并确认存在差异?

或者您刚看了执行计划?

据我了解,您在屏幕截图中看到的查询成本(53%vs 47%)是:

  • 估算查询费用,即使计划是实际的;
  • 是查询费用,而不是时间,它是从CPU和IO&#34;费用&#34;。
  • 组合而来的。

在这种特殊情况下,查询优化器似乎为两个查询生成了几乎相同的计划。对于计划中的某些操作员,计划很可能(略微)计算行数,但实际性能是相同的,因为计划形状是相同的。如果估计的行数不同,则会导致您看到的估算查询成本不同。

要查看计划的差异(如果有的话),我会使用像SQL Sentry Plan Explorer这样的工具。它显示了更多详细信息,您可以更轻松地比较查询的所有方面。

将查询重写得更快是一个不同的问题,我不会尝试在此处回答。

答案 4 :(得分:0)

这假设您正在尝试查找没有主要联系人的帐户,并且只能有一个主要联系人

SELECT  a.SFAccountID, a.SLXID, a.Name 
FROM    [dbo].[Salesforce_Accounts] a
        LEFT JOIN [dbo].[Salesforce_Contacts] c ON a.SFAccountID = c.SFAccountID AND c.Primary__C = 1
WHERE
        EXISTS (SELECT  * 
                FROM SLX_AccountChannel b 
                WHERE b.ACCOUNTID = a.SLXID 
                    AND b.STATUS IN ( 'Active', 'Customer', 'Current' ))
        AND c.SFContactID IS NULL

如果您想要拥有联系人但没有主要联系人的帐户,您可以使用

SELECT 
    a.SFAccountID ,
    a.SLXID ,
    a.Name
FROM 
    [dbo].[Salesforce_Accounts] a
WHERE
    a.SFAccountID IN (SELECT SFAccountID 
                    FROM [Salesforce_Contacts] 
                    GROUP BY SFAccountID 
                    HAVING SUM(CAST(Primary__c AS INT) = 0))

    AND a.SLXID IN (SELECT ACCOUNTID 
                    FROM _SLX_AccountChannel 
                    WHERE [STATUS] IN ( 'Active', 'Customer', 'Current' ))

答案 5 :(得分:0)

您可以不多次点击/加入Salesforce_Contacts。这更紧凑,更快:

SELECT a.SFAccountID, a.SLXID, a.Name
FROM [dbo].[Salesforce_Accounts] a WITH(NOLOCK)
JOIN  _SLX_AccountChannel b WITH(NOLOCK)
    ON a.SLXID = b.ACCOUNTID
JOIN [dbo].[Salesforce_Contacts] c WITH(NOLOCK)
    ON a.SFAccountID = c.SFAccountID
WHERE b.STATUS IN ('Active','Customer', 'Current')
GROUP BY a.SFAccountID, a.SLXID, a.Name
HAVING MAX(c.Primary__C) = 0

INEXISTS之间的差异可以忽略不计。