有效地为集合中的每个类别选择顶行

时间:2010-06-04 14:06:46

标签: sql-server tsql sql-server-2008 query-optimization

我需要从已知集合中选择每个类别的顶行(有点类似于this question)。问题是,如何使这个查询在大量行上有效。

例如,让我们创建一个表,在几个地方存储温度记录。

CREATE TABLE #t (
    placeId int,
    ts datetime,
    temp int,
    PRIMARY KEY (ts, placeId)
)

-- insert some sample data

SET NOCOUNT ON

DECLARE @n int, @ts datetime
SELECT @n = 1000, @ts = '2000-01-01'

WHILE (@n>0) BEGIN
    INSERT INTO #t VALUES (@n % 10, @ts, @n % 37)
    IF (@n % 10 = 0) SET @ts = DATEADD(hour, 1, @ts)
    SET @n = @n - 1
END

现在我需要获得每个地方1,2,3的最新录音。

这种方式很有效,但不能很好地扩展(看起来很脏)。

SELECT * FROM (
    SELECT TOP 1 placeId, temp
    FROM #t 
    WHERE placeId = 1
    ORDER BY ts DESC
) t1
UNION ALL
SELECT * FROM (
    SELECT TOP 1 placeId, temp
    FROM #t 
    WHERE placeId = 2
    ORDER BY ts DESC
) t2
UNION ALL
SELECT * FROM (
    SELECT TOP 1 placeId, temp
    FROM #t 
    WHERE placeId = 3
    ORDER BY ts DESC
) t3

以下看起来更好,但效率却低得多(根据优化器,30%对70%)。

SELECT placeId, ts, temp FROM (
    SELECT placeId, ts, temp, ROW_NUMBER() OVER (PARTITION BY placeId ORDER BY ts DESC) rownum
    FROM #t
    WHERE placeId IN (1, 2, 3)
) t
WHERE rownum = 1

问题是,在后一个查询执行计划期间,对#t执行聚簇索引扫描,检索,排序,编号,然后过滤300行,只留下3行。对于前一个查询,三次获取一行。

有没有办法在没有很多联合的情况下有效地执行查询?

3 个答案:

答案 0 :(得分:2)

不要只查看执行计划,还要查看statistics iostatistics time

set statistics io on
go
SELECT * FROM (
    SELECT TOP 1 placeId, temp
    FROM #t 
    WHERE placeId = 1
    ORDER BY ts DESC
) t1
UNION ALL
SELECT * FROM (
    SELECT TOP 1 placeId, temp
    FROM #t 
    WHERE placeId = 2
    ORDER BY ts DESC
) t2
UNION ALL
SELECT * FROM (
    SELECT TOP 1 placeId, temp
    FROM #t 
    WHERE placeId = 3
    ORDER BY ts DESC
) t3

SELECT placeId,  temp FROM (
    SELECT placeId, ts, temp, ROW_NUMBER() OVER (PARTITION BY placeId ORDER BY ts DESC) rownum
    FROM #t
    WHERE placeId IN (1, 2, 3)
) t
WHERE rownum = 1

set statistics io off
go

表'#t000000000B99'。扫描计数3,逻辑读取6,物理读取0,预读取读取0,lob逻辑读取0,lob物理读取0,lob预读读取0。 表'#t000000000B99'。扫描计数1,逻辑读取6,物理读取0,预读取读取0,lob逻辑读取0,lob物理读取0,lob预读读取0。

set statistics time on
go
SELECT * FROM (
    SELECT TOP 1 placeId, temp
    FROM #t 
    WHERE placeId = 1
    ORDER BY ts DESC
) t1
UNION ALL
SELECT * FROM (
    SELECT TOP 1 placeId, temp
    FROM #t 
    WHERE placeId = 2
    ORDER BY ts DESC
) t2
UNION ALL
SELECT * FROM (
    SELECT TOP 1 placeId, temp
    FROM #t 
    WHERE placeId = 3
    ORDER BY ts DESC
) t3

SELECT placeId,  temp FROM (
    SELECT placeId, ts, temp, ROW_NUMBER() OVER (PARTITION BY placeId ORDER BY ts DESC) rownum
    FROM #t
    WHERE placeId IN (1, 2, 3)
) t
WHERE rownum = 1

set statistics time on
go

对我来说,两种方法之间没有真正的区别,加载更多数据并再次比较

同样,当您向两个查询添加订单时,它降至40%对60%

SELECT * FROM (
    SELECT TOP 1 placeId, temp
    FROM #t 
    WHERE placeId = 1
    ORDER BY ts DESC
) t1
UNION ALL
SELECT * FROM (
    SELECT TOP 1 placeId, temp
    FROM #t 
    WHERE placeId = 2
    ORDER BY ts DESC
) t2
UNION ALL
SELECT * FROM (
    SELECT TOP 1 placeId, temp
    FROM #t 
    WHERE placeId = 3
    ORDER BY ts DESC
) t3
ORDER BY placeId

SELECT placeId,  temp FROM (
    SELECT placeId,  temp, ROW_NUMBER() OVER (PARTITION BY placeId ORDER BY ts DESC) rownum
    FROM #t
    WHERE placeId IN (1, 2, 3)
) t
WHERE rownum = 1
ORDER BY placeId

答案 1 :(得分:1)

我加载了100,000行(这仍然不足以减慢速度),尝试了老式的方式:

select t.*
 from #t t
  inner join (select placeId, max(ts) ts
               from #t
               where placeId in (1,2,3)
               group by placeId) xx
   on xx.placeId = t.placeId
    and xx.ts = t.ts

得到了相同的结果。

然后我将索引中列的顺序颠倒到

CREATE TABLE #t ( 
    placeId int, 
    ts datetime, 
    temp int, 
    PRIMARY KEY (placeId, ts) 
) 

并且,在所有查询中,页面读取次数较少,索引搜索而不是扫描。

如果您的目标是优化,并且您可以修改索引,我修改了主键,或者可能添加覆盖索引。

答案 2 :(得分:0)

仅供记录,使用CROSS APPLY的另一个选项 在我的配置上,它的性能优于之前提到的所有。

SELECT *
FROM (VALUES (1),(2),(3)) t (placeId)
CROSS APPLY (
    SELECT TOP 1 ts, temp
    FROM #t 
    WHERE placeId = t.placeId
    ORDER BY ts DESC
) tt

我猜,VALUES可能会被转换为临时表或表变量而没有太大区别。