我在包含数百万行的表上进行了数据透视查询。正常运行查询,它在2秒内运行并返回2983行。如果我将TOP 1000添加到查询中,则需要10秒才能运行。
导致这种情况的原因是什么?
SELECT *
FROM (SELECT l.PatientID,
l.LabID,
l.Result
FROM dbo.Labs l
JOIN (SELECT MAX(LabDate) maxDate,
PatientID,
LabID
FROM dbo.Labs
GROUP BY PatientID, LabID) s ON l.PatientID = s.PatientID
AND l.LabID = s.LabID
AND l.LabDate = s.maxDate) A
PIVOT(MIN(A.Result) FOR A.LabID IN ([1],[2],[3],[4],[5],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17])) p
执行计划:
这种替代配方有同样的问题:
select
*
FROM (
SELECT
l.PatientID,
l.LabID,
l.Result
FROM dbo.Labs l
where l.LabDate = (
select
MAX(LabDate)
from Labs l2
where l2.PatientID = l.PatientID
and l2.LabID = l.LabID
)
) A
PIVOT(MIN(A.Result) FOR A.LabID IN ([1],[2],[3],[4],[5],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17])) p
答案 0 :(得分:4)
SELECT TOP 1000
*
FROM (
SELECT patientId, labId, result,
DENSE_RANK() OVER (PARTITION BY patientId, labId ORDER BY labDate DESC) dr
FROM labs
) q
PIVOT (
MIN(result)
FOR
labId IN ([1],[2],[3],[4],[5],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17])
) p
WHERE dr = 1
ORDER BY
patientId
您也可以尝试创建这样的索引视图:
CREATE VIEW
v_labs_patient_lab
WITH SCHEMABINDING
AS
SELECT patientId, labId, COUNT_BIG(*) AS cnt
FROM dbo.labs
GROUP BY
patientId, labId
CREATE UNIQUE CLUSTERED INDEX
ux_labs_patient_lab
ON v_labs_patient_lab (patientId, labId)
并在查询中使用它:
SELECT TOP 1000
*
FROM (
SELECT lr.patientId, lr.labId, lr.result
FROM v_labs_patient_lab vl
CROSS APPLY
(
SELECT TOP 1 WITH TIES
result
FROM labs l
WHERE l.patientId = vl.patientId
AND l.labId = vl.labId
ORDER BY
l.labDate DESC
) lr
) q
PIVOT (
MIN(result)
FOR
labId IN ([1],[2],[3],[4],[5],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17])
) p
ORDER BY
patientId
答案 1 :(得分:2)
处理查询的特定顺序。
正常的SQL查询将按如下方式编写:
SELECT [...]
FROM [table1]
JOIN [table2]
ON [condition]
WHERE [...]
GROUP BY [...]
HAVING [...]
ORDER BY [...]
但处理顺序不同:
FROM [table1]
ON [condition]
JOIN [table2]
WHERE [...]
GROUP BY [...]
HAVING [...]
SELECT [...]
ORDER BY [...]
使用SELECT DISTINCT [...]
或SELECT TOP [...]
时,处理顺序如下:
FROM [table1]
ON [condition]
JOIN [table2]
WHERE [...]
GROUP BY [...]
HAVING [...]
SELECT [...] DISTINCT[...]
ORDER BY [...]
TOP [....]
因此,最后处理SELECT TOP 1000
需要更长的时间。
请查看此链接以获取更多详细信息: http://blogs.msdn.com/b/sqlqueryprocessing/
答案 2 :(得分:2)
在做了一些关于建议执行计划的谷歌搜索之后,我找到了解决方案。
SELECT TOP 1000 *
FROM (SELECT l.PatientID,
l.LabID,
l.Result
FROM dbo.Labs l
JOIN (SELECT MAX(LabDate) maxDate,
PatientID,
LabID
FROM dbo.Labs
GROUP BY PatientID, LabID) s ON l.PatientID = s.PatientID
AND l.LabID = s.LabID
AND l.LabDate = s.maxDate) A
PIVOT(MIN(A.Result) FOR A.LabID IN ([1],[2],[3],[4],[5],[7],[8],[9],[10],[11],[12],[13],[14],[15],[16],[17])) p
OPTION (HASH JOIN)
OPTION (HASH JOIN)
就是这样。由此产生的TOP版本的执行计划看起来像是原始的非顶级版本,并在最后添加了TOP。
由于我最初在视图中执行此操作,我实际上最终做的是将JOIN
更改为INNER HASH JOIN