使用SQL Server 2008。
我有多个位置,每个位置包含多个部门,每个部门包含多个可以具有零到多个扫描的项目。每次扫描都与特定操作有关,该操作可能有也可能没有截止时间。每个Item也属于属于特定Job的特定Package。每个作业包含一个或多个包含一个或多个项目的包。
+=============+ +=============+
| Locations | | Jobs |
+=============+ +=============+
^ ^
| |
+=============+ +=============+ +=============+
| Departments | <-- | Items | --> | Packages |
+=============+ +=============+ +=============+
^
|
+=============+ +=============+
| Scans | --> | Operations |
+=============+ +=============+
我要做的是获取按位置和扫描日期分组的作业扫描计数。棘手的部分是我只想计算每个项目的第一次扫描日期/时间,其中操作的截止时间不为空。 (注意:扫描肯定不会在表中的日期/时间顺序。)
我的查询给我的结果是正确的,但是当作业的项目数量达到75,000左右时,它会非常缓慢。我正在推动一个新服务器 - 我知道我们的硬件缺乏 - 但是我想知道我在查询中是否有什么东西在阻碍它。
从我可以从执行计划中收集到的一点点来看,查询的大部分成本似乎都在子查询中,以找到每个项目的第一个扫描。它对Operations表索引(ID,Cutoff)执行索引扫描(0%),然后执行惰性假脱机(19%)。它对Scans表索引(ItemID,DateTime,OperationID,ID)执行索引搜索(61%)。后续嵌套循环(内部联接)仅为2%,Top运算符为0%。 (不是我真的很了解我刚输入的内容,但我想提供尽可能多的信息......)
以下是查询:
SELECT
Departments.LocationID
, DATEADD(dd, 0, DATEDIFF(dd, 0, Scans.DateTime))
, COUNT(Scans.ItemID) AS [COUNT]
FROM
Items
INNER JOIN Scans
ON Scans.ID =
(
SELECT TOP 1
Scans.ID
FROM
Scans
INNER JOIN Operations
ON Scans.OperationID = Operations.ID
WHERE
Operations.Cutoff IS NOT NULL
AND Scans.ItemID = Items.ID
ORDER BY
Scans.DateTime
)
INNER JOIN Operations
ON Scans.OperationID = Operations.ID
INNER JOIN Packages
ON Items.PackageID = Packages.ID
INNER JOIN Departments
ON Items.DepartmentID = Departments.ID
WHERE
Packages.JobID = @ID
GROUP BY
Departments.LocationID
, DATEADD(dd, 0, DATEDIFF(dd, 0, Scans.DateTime));
将返回如下结果的样本:
8 2012-06-08 00:00:00.000 11842
21 2012-06-07 00:00:00.000 502
11 2012-06-12 00:00:00.000 1841
15 2012-06-11 00:00:00.000 4314
16 2012-06-09 00:00:00.000 278
23 2012-06-12 00:00:00.000 1345
6 2012-06-06 00:00:00.000 2005
20 2012-06-08 00:00:00.000 352
14 2012-06-07 00:00:00.000 2408
8 2012-06-11 00:00:00.000 290
19 2012-06-10 00:00:00.000 85
20 2012-06-11 00:00:00.000 5484
7 2012-06-10 00:00:00.000 2389
16 2012-06-06 00:00:00.000 6762
18 2012-06-09 00:00:00.000 4473
14 2012-06-10 00:00:00.000 2364
1 2012-06-11 00:00:00.000 1531
22 2012-06-08 00:00:00.000 14534
5 2012-06-10 00:00:00.000 11908
9 2012-06-12 00:00:00.000 47
19 2012-06-07 00:00:00.000 559
7 2012-06-07 00:00:00.000 2576
这是执行计划(不确定自原始帖子后我改变了什么,但成本%略有不同。但瓶颈似乎仍然在同一区域):
答案 0 :(得分:1)
我对于将此标记为答案我有点怀疑,因为我相信我们仍然可以从查询中挤出一点果汁。但这确实使我的测试运行从22秒降低到6秒(在Scans上添加了索引:OperationID包括DateTime和ItemID):
WITH CTE AS
(
SELECT
Items.ItemID AS ID
, Scans.DateTime AS [DateTime]
, Operations.Cutoff AS Cutoff
, ROW_NUMBER() OVER (PARTITION BY Items.ID ORDER BY Scans.DateTime) AS RN
FROM
Items
INNER JOIN Scans
ON Items.ID = Scans.ItemID
INNER JOIN Operations
ON Scans.OperationID = Operations.ID
INNER JOIN Packages
ON Items.PackageID = Packages.ID
WHERE
Operations.Cutoff IS NOT NULL
AND Packages.JobID = @ID
)
SELECT
Departments.LocationID
, CTE.DateTime
, COUNT(Items.ID) AS COUNT
FROM
Items
INNER JOIN CTE
ON Items.ID = CTE.ID
AND CTE.RN = 1
INNER JOIN Packages
ON Items.PackageID = Packages.ID
INNER JOIN Departments
ON Items.DepartmentID = Departments.ID
WHERE
Packages.JobID = @ID
GROUP BY
Departments.LocationID
, CTE.DateTime
答案 1 :(得分:0)
很难肯定地说,但这样的事情可能表现得更好。我用ROW_NUMBER调用替换了您的嵌套查找。原始查询中的问题是嵌套查找 - 它会杀死你。
注意我面前没有SQL,所以我无法测试它,但我认为它在逻辑上是等价的。
SELECT
Departments.LocationID
, DATEADD(dd, 0, DATEDIFF(dd, 0, Scans.DateTime))
, COUNT(Scans.ItemID) AS [COUNT]
FROM
Items
INNER JOIN Scans
ON Scans.ItemID = Items.ID
INNER JOIN Operations
ON Scans.OperationID = Operations.ID
INNER JOIN Packages
ON Items.PackageID = Packages.ID
INNER JOIN Departments
ON Items.DepartmentID = Departments.ID
WHERE
Operations.Cutoff IS NOT NULL
AND
Packages.JobID = @ID
AND
ROW_NUMBER () OVER (PARTITION BY Items.ID ORDER BY Scans.DateTime) = 1
GROUP BY
Departments.LocationID
, DATEADD(dd, 0, DATEDIFF(dd, 0, Scans.DateTime));
答案 2 :(得分:0)
我很好奇 - 请你运行CROSS APPLY版本吗?
SELECT
Departments.LocationID
, DATEADD(dd, 0, DATEDIFF(dd, 0, CA_Scans.DateTime))
, COUNT(CA_Scans.ItemID) AS [COUNT]
FROM
Items
CROSS APPLY
(
SELECT TOP 1
Scans.ID,
Scans.OperationID,
Scans.DateTime
FROM
Scans
INNER JOIN Operations
ON Scans.OperationID = Operations.ID
WHERE
Operations.Cutoff IS NOT NULL
AND Scans.ItemID = Items.ID
ORDER BY
Scans.DateTime
) CA_Scans
INNER JOIN Operations
ON CA_Scans.OperationID = Operations.ID
INNER JOIN Packages
ON Items.PackageID = Packages.ID
INNER JOIN Departments
ON Items.DepartmentID = Departments.ID
WHERE
Packages.JobID = @ID
GROUP BY
Departments.LocationID
, DATEADD(dd, 0, DATEDIFF(dd, 0, CA_Scans.DateTime));