我需要从已知集合中选择每个类别的顶行(有点类似于this question)。问题是,如何使这个查询在大量行上有效。
例如,让我们创建一个表,在几个地方存储温度记录。
CREATE TABLE #t (
placeId int,
ts datetime,
temp int,
PRIMARY KEY (ts, placeId)
)
-- insert some sample data
SET NOCOUNT ON
DECLARE @n int, @ts datetime
SELECT @n = 1000, @ts = '2000-01-01'
WHILE (@n>0) BEGIN
INSERT INTO #t VALUES (@n % 10, @ts, @n % 37)
IF (@n % 10 = 0) SET @ts = DATEADD(hour, 1, @ts)
SET @n = @n - 1
END
现在我需要获得每个地方1,2,3的最新录音。
这种方式很有效,但不能很好地扩展(看起来很脏)。
SELECT * FROM (
SELECT TOP 1 placeId, temp
FROM #t
WHERE placeId = 1
ORDER BY ts DESC
) t1
UNION ALL
SELECT * FROM (
SELECT TOP 1 placeId, temp
FROM #t
WHERE placeId = 2
ORDER BY ts DESC
) t2
UNION ALL
SELECT * FROM (
SELECT TOP 1 placeId, temp
FROM #t
WHERE placeId = 3
ORDER BY ts DESC
) t3
以下看起来更好,但效率却低得多(根据优化器,30%对70%)。
SELECT placeId, ts, temp FROM (
SELECT placeId, ts, temp, ROW_NUMBER() OVER (PARTITION BY placeId ORDER BY ts DESC) rownum
FROM #t
WHERE placeId IN (1, 2, 3)
) t
WHERE rownum = 1
问题是,在后一个查询执行计划期间,对#t执行聚簇索引扫描,检索,排序,编号,然后过滤300行,只留下3行。对于前一个查询,三次获取一行。
有没有办法在没有很多联合的情况下有效地执行查询?
答案 0 :(得分:2)
不要只查看执行计划,还要查看statistics io
和statistics time
set statistics io on
go
SELECT * FROM (
SELECT TOP 1 placeId, temp
FROM #t
WHERE placeId = 1
ORDER BY ts DESC
) t1
UNION ALL
SELECT * FROM (
SELECT TOP 1 placeId, temp
FROM #t
WHERE placeId = 2
ORDER BY ts DESC
) t2
UNION ALL
SELECT * FROM (
SELECT TOP 1 placeId, temp
FROM #t
WHERE placeId = 3
ORDER BY ts DESC
) t3
SELECT placeId, temp FROM (
SELECT placeId, ts, temp, ROW_NUMBER() OVER (PARTITION BY placeId ORDER BY ts DESC) rownum
FROM #t
WHERE placeId IN (1, 2, 3)
) t
WHERE rownum = 1
set statistics io off
go
表'#t000000000B99'。扫描计数3,逻辑读取6,物理读取0,预读取读取0,lob逻辑读取0,lob物理读取0,lob预读读取0。 表'#t000000000B99'。扫描计数1,逻辑读取6,物理读取0,预读取读取0,lob逻辑读取0,lob物理读取0,lob预读读取0。
set statistics time on
go
SELECT * FROM (
SELECT TOP 1 placeId, temp
FROM #t
WHERE placeId = 1
ORDER BY ts DESC
) t1
UNION ALL
SELECT * FROM (
SELECT TOP 1 placeId, temp
FROM #t
WHERE placeId = 2
ORDER BY ts DESC
) t2
UNION ALL
SELECT * FROM (
SELECT TOP 1 placeId, temp
FROM #t
WHERE placeId = 3
ORDER BY ts DESC
) t3
SELECT placeId, temp FROM (
SELECT placeId, ts, temp, ROW_NUMBER() OVER (PARTITION BY placeId ORDER BY ts DESC) rownum
FROM #t
WHERE placeId IN (1, 2, 3)
) t
WHERE rownum = 1
set statistics time on
go
对我来说,两种方法之间没有真正的区别,加载更多数据并再次比较
同样,当您向两个查询添加订单时,它降至40%对60%
SELECT * FROM (
SELECT TOP 1 placeId, temp
FROM #t
WHERE placeId = 1
ORDER BY ts DESC
) t1
UNION ALL
SELECT * FROM (
SELECT TOP 1 placeId, temp
FROM #t
WHERE placeId = 2
ORDER BY ts DESC
) t2
UNION ALL
SELECT * FROM (
SELECT TOP 1 placeId, temp
FROM #t
WHERE placeId = 3
ORDER BY ts DESC
) t3
ORDER BY placeId
SELECT placeId, temp FROM (
SELECT placeId, temp, ROW_NUMBER() OVER (PARTITION BY placeId ORDER BY ts DESC) rownum
FROM #t
WHERE placeId IN (1, 2, 3)
) t
WHERE rownum = 1
ORDER BY placeId
答案 1 :(得分:1)
我加载了100,000行(这仍然不足以减慢速度),尝试了老式的方式:
select t.*
from #t t
inner join (select placeId, max(ts) ts
from #t
where placeId in (1,2,3)
group by placeId) xx
on xx.placeId = t.placeId
and xx.ts = t.ts
得到了相同的结果。
然后我将索引中列的顺序颠倒到
CREATE TABLE #t (
placeId int,
ts datetime,
temp int,
PRIMARY KEY (placeId, ts)
)
并且,在所有查询中,页面读取次数较少,索引搜索而不是扫描。
如果您的目标是优化,并且您可以修改索引,我修改了主键,或者可能添加覆盖索引。
答案 2 :(得分:0)
仅供记录,使用CROSS APPLY的另一个选项 在我的配置上,它的性能优于之前提到的所有。
SELECT *
FROM (VALUES (1),(2),(3)) t (placeId)
CROSS APPLY (
SELECT TOP 1 ts, temp
FROM #t
WHERE placeId = t.placeId
ORDER BY ts DESC
) tt
我猜,VALUES可能会被转换为临时表或表变量而没有太大区别。