COUNT(1)+ COUNT(DISTINCT())比单独执行2个查询慢得多

时间:2016-10-19 10:37:54

标签: sql sql-server sqlperformance

查询说明:

  • Person(由PersonID标识)可能有或没有相应的Job(由JobID标识)。
  • 如果有相应的Job,则约束存储在表格PersonJobPersonID< => JobID)中。
  • Person没有Job会被忽略。
  • Job也有CityID
  • 每个Job.CityID,查询想要知道Person的总计数以及唯一Person.HouseID的计数

查询:

SELECT
  Job.CityID, COUNT(1) NumTotal, COUNT(DISTINCT(Person.HouseID)) NumDistinct
FROM
  Job
  INNER JOIN PersonJob ON (PersonJob.JobID = Job.JobID)
  INNER JOIN Person ON (Person.PersonID = PersonJob.PersonID)
GROUP BY
  Job.CityID

统计:

  • SELECT COUNT(1) FROM PersonJob ~600.000
  • SELECT COUNT(1) FROM Person ~800.000
  • SELECT COUNT(DISTINCT(Person.HouseID)) FROM Person ~10,000
  • SELECT COUNT(1) FROM Job ~500
  • MS SQL Server 10.50

问题:

  • COUNT(1)查询的一部分,当单独运行时,运行时间为0.25秒。

    SELECT
      Job.CityID, COUNT(1) NumTotal
    FROM
      Job
      INNER JOIN PersonJob ON (PersonJob.JobID = Job.JobID)
      INNER JOIN Person ON (Person.PersonID = PersonJob.PersonID)
    GROUP BY
      Job.CityID
    
  • COUNT(DISTINCT(Person.HouseID))查询的一部分,当单独运行时,运行在0.80秒。

    SELECT
      Job.CityID, COUNT(DISTINCT(Person.HouseID)) NumDistinct
    FROM
      Job
      INNER JOIN PersonJob ON (PersonJob.JobID = Job.JobID)
      INNER JOIN Person ON (Person.PersonID = PersonJob.PersonID)
    GROUP BY
      Job.CityID
    
  • 整个查询在3.10秒运行 - 慢3倍,为什么?

执行计划:

  • 我没有专家阅读这些,抱歉。
  • 据我所知,问题出在COUNT(DISTINCT)
  • 之内
  • 在部分查询中:
    • 25%哈希匹配(聚合)(输出Job.CityID
    • 15%哈希匹配(内部联接)(输出Job.CityIDPerson.HouseID
      • 30%索引扫描(输出Person.PersonIDPerson.HouseID
      • 14%索引寻求(输出PersonJob.PersonID
  • 完整查询:
    • 03%哈希匹配(部分聚合)(输出Job.CityIDCOUNT(*)
    • 31%哈希匹配(聚合)(输出Job.CityID
    • 29%表假脱机(输出Job.CityIDPerson.HouseID

2 个答案:

答案 0 :(得分:4)

这是2012年之前SQL Server版本中的一个已知问题。

您可以尝试基于on the code here的重写。

WITH T1
     AS (SELECT Job.CityID,
                Person.HouseID
         FROM   Job
                INNER JOIN PersonJob
                        ON ( PersonJob.JobID = Job.JobID )
                INNER JOIN Person
                        ON ( Person.PersonID = PersonJob.PersonID )),
     PartialSums
     AS (SELECT COUNT(*) AS CountStarPartialCount,
                HouseID,
                CityID
         FROM   T1
         GROUP  BY CityID,
                   HouseID)
SELECT CityID,
       SUM(CountStarPartialCount) AS NumTotal,
       COUNT(HouseID)             AS NumDistinct
FROM   PartialSums
GROUP  BY CityID 

SQL Server 2012在这方面有一些改进。见Is Distinct Aggregation Still Considered Harmful?

答案 1 :(得分:1)

在阅读了Martin Smith提供的解决方法之后,我已经确定解决方法难以阅读和理解,并且如果需要额外的DISTINCT列,则会变得完全混乱。我决定LEFT JOIN部分查询如下:

SELECT
  Job.CityID, NumTotal.Value, NumDistinct.Value
FROM
  Job
  LEFT JOIN
  (
    SELECT
      Job.CityID, COUNT(1) AS Value
    FROM
      Job
      INNER JOIN PersonJob ON (PersonJob.JobID = Job.JobID)
      INNER JOIN Person ON (Person.PersonID = PersonJob.PersonID)
    GROUP BY
      Job.CityID
  ) NumTotal ON (NumTotal.CityID = Job.CityID)
  LEFT JOIN
  (
    SELECT
      Job.CityID, COUNT(DISTINCT Person.HouseID) AS Value
    FROM
      Job
      INNER JOIN PersonJob ON (PersonJob.JobID = Job.JobID)
      INNER JOIN Person ON (Person.PersonID = PersonJob.PersonID)
    GROUP BY
      Job.CityID
  ) NumDistinct ON (NumDistinct.CityID = Job.CityID)
GROUP BY
  Job.CityID

这在0.70秒内运行,而"解决方法" sql在0.60秒内运行。这意味着LEFT JOIN' inig比"原始完整查询"快3倍。并且只有20%的速度慢于#34;解决方法",同时更容易阅读和扩展。