如何选择连续的最小行?

时间:2019-04-11 23:48:57

标签: sql sql-server sql-server-2008

考虑按Date desc排序的表中的数据。

enter image description here

如果有多个连续的行具有相同的描述,那么我想只取一个日期最早的行。例如,第2行和第3行是未知,我只想在 2014年9月12日中保留该行。

我一直在尝试将CTE与ROW_NUMBER()结合使用,但是我无法将其限制为具有连续相同描述的行。

;WITH removeConsecutiveRows AS (
  SELECT ph.Description,
       ph.Price,
       ph.Date,
       ROW_NUMBER() OVER (
          PARTITION BY ph.Description
          ORDER BY ph.Date
       ) AS rowNum 
  FROM #PriceHistory ph (NOLOCK)
)
SELECT s.Description,
       s.Price,
       s.Date,
       s.rowNum
FROM removeConsecutiveRows s
WHERE s.rowNum = 1
ORDER BY s.Date DESC

因此,最后应该看起来像这样:

enter image description here

我应该注意,这是SQL Server 2008。

1 个答案:

答案 0 :(得分:1)

在检测到组/岛之后,这看起来像是一个“空白岛”问题,顶部是“每个组的前1个”。

这是一种方法。

样本数据

CREATE TABLE #temptable ( Descr varchar(50), [Price] int, dt date )
INSERT INTO #temptable
VALUES
( 'Active', 799900, N'2019-02-27T00:00:00' ), 
( 'Unknown', 629900, N'2014-09-24T00:00:00' ), 
( 'Unknown', 629900, N'2014-09-12T00:00:00' ), 
( 'Sold', 625900, N'2014-09-08T00:00:00' ), 
( 'Unknown', 629900, N'2014-08-10T00:00:00' ), 
( 'Active', 629900, N'2014-07-27T00:00:00' ), 
( 'Pending', 629900, N'2014-07-25T00:00:00' ), 
( 'Pending', 629900, N'2014-07-24T00:00:00' ), 
( 'Unknown', 629900, N'2014-07-20T00:00:00' ), 
( 'Active', 629900, N'2014-07-16T00:00:00' ), 
( 'Active', 629900, N'2014-07-15T00:00:00' ), 
( 'Taking Backup Offers', 629900, N'2014-07-11T00:00:00' ), 
( 'Active', 629900, N'2014-06-28T00:00:00' ), 
( 'Active', 629900, N'2014-06-27T00:00:00' ), 
( 'Taking Backup Offers', 629900, N'2014-06-27T00:00:00' ), 
( 'Active', 629900, N'2014-06-23T00:00:00' ), 
( 'Active', 629900, N'2014-06-11T00:00:00' ), 
( 'Active', 629900, N'2014-06-10T00:00:00' ), 
( 'Sold', 570000, N'2010-01-22T00:00:00' ), 
( 'Sold', 288000, N'2000-09-01T00:00:00' );

查询

WITH
CTE_RN
AS
(
    SELECT
        * 
        ,ROW_NUMBER() OVER (ORDER BY dt DESC) AS rn1
        ,ROW_NUMBER() OVER (PARTITION BY Descr ORDER BY dt DESC) AS rn2
    FROM #temptable
)
,CTE_Groups
AS
(
    SELECT
        *
        ,rn1 - rn2 AS Groups
        ,ROW_NUMBER() OVER (PARTITION BY Descr, rn1 - rn2 ORDER BY dt) AS rn
    FROM CTE_RN
)
SELECT Descr, Price, dt
FROM CTE_Groups
WHERE rn = 1
ORDER BY dt DESC;

结果

+----------------------+--------+------------+
|        Descr         | Price  |     dt     |
+----------------------+--------+------------+
| Active               | 799900 | 2019-02-27 |
| Unknown              | 629900 | 2014-09-12 |
| Sold                 | 625900 | 2014-09-08 |
| Unknown              | 629900 | 2014-08-10 |
| Active               | 629900 | 2014-07-27 |
| Pending              | 629900 | 2014-07-24 |
| Unknown              | 629900 | 2014-07-20 |
| Active               | 629900 | 2014-07-15 |
| Taking Backup Offers | 629900 | 2014-07-11 |
| Taking Backup Offers | 629900 | 2014-06-27 |
| Active               | 629900 | 2014-06-27 |
| Active               | 629900 | 2014-06-10 |
| Sold                 | 288000 | 2000-09-01 |
+----------------------+--------+------------+

请注意,由于有两行具有相同的日期2014-06-27,因此服务器可能会像您在预期结果中显示的那样返回它们,或者可以按此处所示返回它们。您很可能有一个ID列,因此可以使用它来解决排序问题。


要了解其工作方式,请运行中间查询并检查其结果(列rn1, rn2, Groups, rn)。

WITH
CTE_RN
AS
(
    SELECT
        * 
        ,ROW_NUMBER() OVER (ORDER BY dt DESC) AS rn1
        ,ROW_NUMBER() OVER (PARTITION BY Descr ORDER BY dt DESC) AS rn2
    FROM #temptable
)
,CTE_Groups
AS
(
    SELECT
        *
        ,rn1 - rn2 AS Groups
        ,ROW_NUMBER() OVER (PARTITION BY Descr, rn1 - rn2 ORDER BY dt) AS rn
    FROM CTE_RN
)
SELECT *
FROM CTE_Groups
ORDER BY dt DESC;

结果

+----------------------+--------+------------+-----+-----+--------+----+
|        Descr         | Price  |     dt     | rn1 | rn2 | Groups | rn |
+----------------------+--------+------------+-----+-----+--------+----+
| Active               | 799900 | 2019-02-27 |   1 |   1 |      0 |  1 |
| Unknown              | 629900 | 2014-09-24 |   2 |   1 |      1 |  2 |
| Unknown              | 629900 | 2014-09-12 |   3 |   2 |      1 |  1 |
| Sold                 | 625900 | 2014-09-08 |   4 |   1 |      3 |  1 |
| Unknown              | 629900 | 2014-08-10 |   5 |   3 |      2 |  1 |
| Active               | 629900 | 2014-07-27 |   6 |   2 |      4 |  1 |
| Pending              | 629900 | 2014-07-25 |   7 |   1 |      6 |  2 |
| Pending              | 629900 | 2014-07-24 |   8 |   2 |      6 |  1 |
| Unknown              | 629900 | 2014-07-20 |   9 |   4 |      5 |  1 |
| Active               | 629900 | 2014-07-16 |  10 |   3 |      7 |  2 |
| Active               | 629900 | 2014-07-15 |  11 |   4 |      7 |  1 |
| Taking Backup Offers | 629900 | 2014-07-11 |  12 |   1 |     11 |  1 |
| Active               | 629900 | 2014-06-28 |  13 |   5 |      8 |  2 |
| Active               | 629900 | 2014-06-27 |  14 |   6 |      8 |  1 |
| Taking Backup Offers | 629900 | 2014-06-27 |  15 |   2 |     13 |  1 |
| Active               | 629900 | 2014-06-23 |  16 |   7 |      9 |  3 |
| Active               | 629900 | 2014-06-11 |  17 |   8 |      9 |  2 |
| Active               | 629900 | 2014-06-10 |  18 |   9 |      9 |  1 |
| Sold                 | 570000 | 2010-01-22 |  19 |   2 |     17 |  2 |
| Sold                 | 288000 | 2000-09-01 |  20 |   3 |     17 |  1 |
+----------------------+--------+------------+-----+-----+--------+----+

注意

在主查询中添加ORDER BY dt DESC, rn1 ASC并不能保证会产生您期望的结果。值14和15的rn1可以互换,因为它们的日期(2014-06-27)是相同的。如果日期不是唯一的,则需要额外的唯一列以使排序稳定且可预测。示例数据中没有这样的列,但是通常表具有唯一的主键,因此您应该使用它。

因此,对于您的样本数据,查询得出此结果是完全正常的:

中级

+----------------------+--------+------------+-----+-----+--------+----+
|        Descr         | Price  |     dt     | rn1 | rn2 | Groups | rn |
+----------------------+--------+------------+-----+-----+--------+----+
| Active               | 799900 | 2019-02-27 |   1 |   1 |      0 |  1 |
| Unknown              | 629900 | 2014-09-24 |   2 |   1 |      1 |  2 |
| Unknown              | 629900 | 2014-09-12 |   3 |   2 |      1 |  1 |
| Sold                 | 625900 | 2014-09-08 |   4 |   1 |      3 |  1 |
| Unknown              | 629900 | 2014-08-10 |   5 |   3 |      2 |  1 |
| Active               | 629900 | 2014-07-27 |   6 |   2 |      4 |  1 |
| Pending              | 629900 | 2014-07-25 |   7 |   1 |      6 |  2 |
| Pending              | 629900 | 2014-07-24 |   8 |   2 |      6 |  1 |
| Unknown              | 629900 | 2014-07-20 |   9 |   4 |      5 |  1 |
| Active               | 629900 | 2014-07-16 |  10 |   3 |      7 |  2 |
| Active               | 629900 | 2014-07-15 |  11 |   4 |      7 |  1 |
| Taking Backup Offers | 629900 | 2014-07-11 |  12 |   1 |     11 |  1 |
| Active               | 629900 | 2014-06-28 |  13 |   5 |      8 |  1 |
| Taking Backup Offers | 629900 | 2014-06-27 |  14 |   2 |     12 |  1 |
| Active               | 629900 | 2014-06-27 |  15 |   6 |      9 |  4 |
| Active               | 629900 | 2014-06-23 |  16 |   7 |      9 |  3 |
| Active               | 629900 | 2014-06-11 |  17 |   8 |      9 |  2 |
| Active               | 629900 | 2014-06-10 |  18 |   9 |      9 |  1 |
| Sold                 | 570000 | 2010-01-22 |  19 |   2 |     17 |  2 |
| Sold                 | 288000 | 2000-09-01 |  20 |   3 |     17 |  1 |
+----------------------+--------+------------+-----+-----+--------+----+

最终

+----------------------+--------+------------+
|        Descr         | Price  |     dt     |
+----------------------+--------+------------+
| Active               | 799900 | 2019-02-27 |
| Unknown              | 629900 | 2014-09-12 |
| Sold                 | 625900 | 2014-09-08 |
| Unknown              | 629900 | 2014-08-10 |
| Active               | 629900 | 2014-07-27 |
| Pending              | 629900 | 2014-07-24 |
| Unknown              | 629900 | 2014-07-20 |
| Active               | 629900 | 2014-07-15 |
| Taking Backup Offers | 629900 | 2014-07-11 |
| Active               | 629900 | 2014-06-28 |
| Taking Backup Offers | 629900 | 2014-06-27 |
| Active               | 629900 | 2014-06-10 |
| Sold                 | 288000 | 2000-09-01 |
+----------------------+--------+------------+

如您所见,该结果与第一个结果不同,因为有两行具有相同的日期,并且引擎可以自由地将它们按任何顺序放置。

在此结果中,Active的日期为2014-06-28,因为Active2014-06-27恰好位于Taking Backup Offers 2014-06-27下方。