找到与Min / Max关联的行,没有内部循环

时间:2009-07-31 11:06:08

标签: sql sql-server tsql

我有一个与T-SQL和SQL Server相关的问题。

假设我有一个包含2列的表订单:

  • ProductId int
  • CustomerId int
  • 日期日期时间

我想要每个产品的第一个订单的日期,所以我执行这种类型的查询:

SELECT ProductId, MIN(Date) AS FirstOrder 
FROM Orders
GROUP BY ProductId

我在ProductId上有一个索引,包括CustomerIdDate列,以加快查询速度(IX_Orders)。查询计划看起来像IX_Orders上的非聚集索引扫描,后跟流聚合(由于索引没有排序)。

现在我的问题是我还要检索与每个产品的第一个订单相关联的CustomerId(产品26在25日星期二首次订购,由客户12订购)。棘手的部分是我不希望执行计划中有任何内部循环,因为这意味着表中每ProductId有一个额外的读取,这是非常低效的。

这应该可以使用相同的非聚集索引扫描,然后是流聚合,但是我似乎无法找到可以执行此操作的查询。有什么想法吗?

由于

6 个答案:

答案 0 :(得分:3)

这将处理具有重复日期的产品:

DECLARE @Orders table (ProductId int
                      ,CustomerId int
                      ,Date datetime
                      )

INSERT INTO @Orders VALUES (1,1,'20090701')
INSERT INTO @Orders VALUES (2,1,'20090703')
INSERT INTO @Orders VALUES (3,1,'20090702')
INSERT INTO @Orders VALUES (1,2,'20090704')
INSERT INTO @Orders VALUES (4,2,'20090701')
INSERT INTO @Orders VALUES (1,3,'20090706')
INSERT INTO @Orders VALUES (2,3,'20090704')
INSERT INTO @Orders VALUES (4,3,'20090702')
INSERT INTO @Orders VALUES (5,5,'20090703')  --duplicate dates for product #5
INSERT INTO @Orders VALUES (5,1,'20090703')  --duplicate dates for product #5
INSERT INTO @Orders VALUES (5,5,'20090703')  --duplicate dates for product #5

;WITH MinOrders AS
(SELECT
     o.ProductId, o.CustomerId, o.Date
         ,row_number() over(partition by o.ProductId order by o.ProductId,o.CustomerId) AS RankValue
     FROM @Orders o
     INNER JOIN (SELECT
                     ProductId
                         ,MIN(Date) MinDate 
                     FROM @Orders 
                     GROUP BY ProductId
                ) dt ON o.ProductId=dt.ProductId AND o.Date=dt.MinDate
 )
SELECT
    m.ProductId, m.CustomerId, m.Date
    FROM MinOrders  m
    WHERE m.RankValue=1
    ORDER BY m.ProductId, m.CustomerId

这将返回相同的结果,只需使用与上述代码相同的声明和插入:

;WITH MinOrders AS
(SELECT
     o.ProductId, o.CustomerId, o.Date
         ,row_number() over(partition by o.ProductId order by o.ProductId,o.CustomerId) AS RankValue
     FROM @Orders o
 )
SELECT
    m.ProductId, m.CustomerId, m.Date
    FROM MinOrders  m
    WHERE m.RankValue=1
    ORDER BY m.ProductId, m.CustomerId

您可以试用每个版本,看看哪个版本运行得更快......

答案 1 :(得分:2)

declare @Orders table (
    ProductId int,
    CustomerId int,
    Date datetime
)

insert into @Orders values (1,1,'20090701')
insert into @Orders values (2,1,'20090703')
insert into @Orders values (3,1,'20090702')
insert into @Orders values (1,2,'20090704')
insert into @Orders values (4,2,'20090701')
insert into @Orders values (1,3,'20090706')
insert into @Orders values (2,3,'20090704')
insert into @Orders values (4,3,'20090702')
insert into @Orders values (5,5,'20090703')

select O.* from @Orders O inner join 
(
    select ProductId,
    MIN(Date) MinDate 
    from @Orders 
    group by ProductId
) FO
on FO.ProductId = O.ProductId and FO.MinDate = O.Date

这个估计的查询计划没用,因为我用表变量嘲笑它,但匿名内连接应该优先于子选择。

答案 2 :(得分:1)

SQL Server 2005+

SELECT  oo.*
FROM    (
        SELECT  DISTINCT ProductId
        FROM    Orders
        ) od
CROSS APPLY
        (
        SELECT  TOP 1 ProductID, Date, CustomerID
        FROM    Orders oi
        WHERE   oi.ProductID = od.ProductID
        ORDER BY
                Date DESC
        ) oo

名义上,查询计划包含Nested Loops

但是,外部循环将使用Index Scan Stream Aggregate,内部循环将包含Index Seek ProductID Top

实际上,第二个操作几乎是免费的,因为内部循环中使用的索引页面很可能会驻留在缓存中,因为它刚刚用于外部循环。

以下是1,000,000行(100 DISTINCT ProductID)的测试结果:

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 1 ms.

(строк обработано: 100)
Table 'Orders'. Scan count 103, logical reads 6020, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 234 ms,  elapsed time = 125 ms.

,虽然这只是SELECT DISTINCT查询的结果:

SELECT  od.*
FROM    (
        SELECT  DISTINCT ProductId
        FROM    Orders
        ) od

统计数据:

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 1 ms.

(строк обработано: 100)
Table 'Orders'. Scan count 3, logical reads 5648, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 250 ms,  elapsed time = 125 ms.

我们可以看到,效果相同,而CROSS APPLY只需要400额外logical reads(最有可能永远不会是physical)。

不知道如何改进此查询。

此查询的好处是它很好地并行化。您可能会注意到CPU时间是elapsed time的两倍:这是由于旧Core Duo上的并行化造成的。

4-core CPU将在一半的时间内完成此查询。

使用窗口函数的解决方案不会并行化:

SELECT  od.*
FROM    (
        SELECT  ProductId, Date, CustomerID, ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY Date DESC) AS rn
        FROM    Orders
        ) od
WHERE   rn = 1

,以下是统计数据:

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.

(строк обработано: 100)
Table 'Orders'. Scan count 1, logical reads 5123, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 406 ms,  elapsed time = 415 ms.

答案 3 :(得分:0)

SELECT
    o1.productid, 
    o1.date, 
    o1.customerid
FROM
    Orders o1
JOIN
    (select productid, min(date) as orderDate
     from Orders
     group by productid
    ) firstOrder
ON o1.productid = firstOrder.productid

这是我能想到的最好但是老实说,我不知道这个查询的性能特征是什么。如果它不好,我可能会建议运行两个查询来获取您想要的信息。

答案 4 :(得分:0)

IX_Orders是按ProductId排序,然后是CutomerId,然后是Date还是ProductId,然后是Date,然后是CustomerId?如果是前者则改为后者。

换句话说,不要使用它:

create index IX_Orders on Orders (ProductId, CustomerId, Date) 

请改用:

create index IX_Orders on Orders (ProductId, Date, CustomerId)

然后,如果你这样做:

SELECT o1.* 
FROM [Order] o1
JOIN
    (
        SELECT ProductID, Min(Date) as Date
        FROM [Order]
        GROUP BY ProductID
    ) o2
    ON o1.ProductID = o2.ProductID AND o1.Date = o2.Date
ORDER BY ProductID

您最终只能在IX_Orders上进行一次索引扫描,但如果两个客户可以同时订购同一产品,则每个产品可以获得多行。您可以使用以下查询来解决此问题,但效率低于第一个:

WITH cte AS
(
    SELECT ProductID, CustomerID, Date, 
        ROW_NUMBER() OVER(PARTITION BY ProductID ORDER BY Date ASC) AS row
    FROM [Order]
)
SELECT ProductID, CustomerId, Date
FROM cte
WHERE row = 1
ORDER BY ProductID

答案 5 :(得分:0)

如果不执行子查询或窗口函数(例如row_number,rank),我没有看到这样做的好方法,因为max只能在一列中查找。

然而,你可以做得不好。

SELECT
    productid, 
    min(date), 
cast(
    substring( 
        min(convert(varchar(23),date,21) + cast(customerid as varchar(20)))
              , 24, 44)
    as int) customerid
from 
    orders
group by
    productid 

仅当您的客户ID少于20位时才有效。

编辑: group by clause添加