考虑按Date desc排序的表中的数据。
如果有多个连续的行具有相同的描述,那么我想只取一个日期最早的行。例如,第2行和第3行是未知,我只想在 2014年9月12日中保留该行。
我一直在尝试将CTE与ROW_NUMBER()结合使用,但是我无法将其限制为具有连续相同描述的行。
;WITH removeConsecutiveRows AS (
SELECT ph.Description,
ph.Price,
ph.Date,
ROW_NUMBER() OVER (
PARTITION BY ph.Description
ORDER BY ph.Date
) AS rowNum
FROM #PriceHistory ph (NOLOCK)
)
SELECT s.Description,
s.Price,
s.Date,
s.rowNum
FROM removeConsecutiveRows s
WHERE s.rowNum = 1
ORDER BY s.Date DESC
因此,最后应该看起来像这样:
我应该注意,这是SQL Server 2008。
答案 0 :(得分:1)
在检测到组/岛之后,这看起来像是一个“空白岛”问题,顶部是“每个组的前1个”。
这是一种方法。
样本数据
CREATE TABLE #temptable ( Descr varchar(50), [Price] int, dt date )
INSERT INTO #temptable
VALUES
( 'Active', 799900, N'2019-02-27T00:00:00' ),
( 'Unknown', 629900, N'2014-09-24T00:00:00' ),
( 'Unknown', 629900, N'2014-09-12T00:00:00' ),
( 'Sold', 625900, N'2014-09-08T00:00:00' ),
( 'Unknown', 629900, N'2014-08-10T00:00:00' ),
( 'Active', 629900, N'2014-07-27T00:00:00' ),
( 'Pending', 629900, N'2014-07-25T00:00:00' ),
( 'Pending', 629900, N'2014-07-24T00:00:00' ),
( 'Unknown', 629900, N'2014-07-20T00:00:00' ),
( 'Active', 629900, N'2014-07-16T00:00:00' ),
( 'Active', 629900, N'2014-07-15T00:00:00' ),
( 'Taking Backup Offers', 629900, N'2014-07-11T00:00:00' ),
( 'Active', 629900, N'2014-06-28T00:00:00' ),
( 'Active', 629900, N'2014-06-27T00:00:00' ),
( 'Taking Backup Offers', 629900, N'2014-06-27T00:00:00' ),
( 'Active', 629900, N'2014-06-23T00:00:00' ),
( 'Active', 629900, N'2014-06-11T00:00:00' ),
( 'Active', 629900, N'2014-06-10T00:00:00' ),
( 'Sold', 570000, N'2010-01-22T00:00:00' ),
( 'Sold', 288000, N'2000-09-01T00:00:00' );
查询
WITH
CTE_RN
AS
(
SELECT
*
,ROW_NUMBER() OVER (ORDER BY dt DESC) AS rn1
,ROW_NUMBER() OVER (PARTITION BY Descr ORDER BY dt DESC) AS rn2
FROM #temptable
)
,CTE_Groups
AS
(
SELECT
*
,rn1 - rn2 AS Groups
,ROW_NUMBER() OVER (PARTITION BY Descr, rn1 - rn2 ORDER BY dt) AS rn
FROM CTE_RN
)
SELECT Descr, Price, dt
FROM CTE_Groups
WHERE rn = 1
ORDER BY dt DESC;
结果
+----------------------+--------+------------+
| Descr | Price | dt |
+----------------------+--------+------------+
| Active | 799900 | 2019-02-27 |
| Unknown | 629900 | 2014-09-12 |
| Sold | 625900 | 2014-09-08 |
| Unknown | 629900 | 2014-08-10 |
| Active | 629900 | 2014-07-27 |
| Pending | 629900 | 2014-07-24 |
| Unknown | 629900 | 2014-07-20 |
| Active | 629900 | 2014-07-15 |
| Taking Backup Offers | 629900 | 2014-07-11 |
| Taking Backup Offers | 629900 | 2014-06-27 |
| Active | 629900 | 2014-06-27 |
| Active | 629900 | 2014-06-10 |
| Sold | 288000 | 2000-09-01 |
+----------------------+--------+------------+
请注意,由于有两行具有相同的日期2014-06-27
,因此服务器可能会像您在预期结果中显示的那样返回它们,或者可以按此处所示返回它们。您很可能有一个ID
列,因此可以使用它来解决排序问题。
要了解其工作方式,请运行中间查询并检查其结果(列rn1, rn2, Groups, rn
)。
WITH
CTE_RN
AS
(
SELECT
*
,ROW_NUMBER() OVER (ORDER BY dt DESC) AS rn1
,ROW_NUMBER() OVER (PARTITION BY Descr ORDER BY dt DESC) AS rn2
FROM #temptable
)
,CTE_Groups
AS
(
SELECT
*
,rn1 - rn2 AS Groups
,ROW_NUMBER() OVER (PARTITION BY Descr, rn1 - rn2 ORDER BY dt) AS rn
FROM CTE_RN
)
SELECT *
FROM CTE_Groups
ORDER BY dt DESC;
结果
+----------------------+--------+------------+-----+-----+--------+----+
| Descr | Price | dt | rn1 | rn2 | Groups | rn |
+----------------------+--------+------------+-----+-----+--------+----+
| Active | 799900 | 2019-02-27 | 1 | 1 | 0 | 1 |
| Unknown | 629900 | 2014-09-24 | 2 | 1 | 1 | 2 |
| Unknown | 629900 | 2014-09-12 | 3 | 2 | 1 | 1 |
| Sold | 625900 | 2014-09-08 | 4 | 1 | 3 | 1 |
| Unknown | 629900 | 2014-08-10 | 5 | 3 | 2 | 1 |
| Active | 629900 | 2014-07-27 | 6 | 2 | 4 | 1 |
| Pending | 629900 | 2014-07-25 | 7 | 1 | 6 | 2 |
| Pending | 629900 | 2014-07-24 | 8 | 2 | 6 | 1 |
| Unknown | 629900 | 2014-07-20 | 9 | 4 | 5 | 1 |
| Active | 629900 | 2014-07-16 | 10 | 3 | 7 | 2 |
| Active | 629900 | 2014-07-15 | 11 | 4 | 7 | 1 |
| Taking Backup Offers | 629900 | 2014-07-11 | 12 | 1 | 11 | 1 |
| Active | 629900 | 2014-06-28 | 13 | 5 | 8 | 2 |
| Active | 629900 | 2014-06-27 | 14 | 6 | 8 | 1 |
| Taking Backup Offers | 629900 | 2014-06-27 | 15 | 2 | 13 | 1 |
| Active | 629900 | 2014-06-23 | 16 | 7 | 9 | 3 |
| Active | 629900 | 2014-06-11 | 17 | 8 | 9 | 2 |
| Active | 629900 | 2014-06-10 | 18 | 9 | 9 | 1 |
| Sold | 570000 | 2010-01-22 | 19 | 2 | 17 | 2 |
| Sold | 288000 | 2000-09-01 | 20 | 3 | 17 | 1 |
+----------------------+--------+------------+-----+-----+--------+----+
在主查询中添加ORDER BY dt DESC, rn1 ASC
并不能保证会产生您期望的结果。值14和15的rn1
可以互换,因为它们的日期(2014-06-27
)是相同的。如果日期不是唯一的,则需要额外的唯一列以使排序稳定且可预测。示例数据中没有这样的列,但是通常表具有唯一的主键,因此您应该使用它。
因此,对于您的样本数据,查询得出此结果是完全正常的:
中级
+----------------------+--------+------------+-----+-----+--------+----+
| Descr | Price | dt | rn1 | rn2 | Groups | rn |
+----------------------+--------+------------+-----+-----+--------+----+
| Active | 799900 | 2019-02-27 | 1 | 1 | 0 | 1 |
| Unknown | 629900 | 2014-09-24 | 2 | 1 | 1 | 2 |
| Unknown | 629900 | 2014-09-12 | 3 | 2 | 1 | 1 |
| Sold | 625900 | 2014-09-08 | 4 | 1 | 3 | 1 |
| Unknown | 629900 | 2014-08-10 | 5 | 3 | 2 | 1 |
| Active | 629900 | 2014-07-27 | 6 | 2 | 4 | 1 |
| Pending | 629900 | 2014-07-25 | 7 | 1 | 6 | 2 |
| Pending | 629900 | 2014-07-24 | 8 | 2 | 6 | 1 |
| Unknown | 629900 | 2014-07-20 | 9 | 4 | 5 | 1 |
| Active | 629900 | 2014-07-16 | 10 | 3 | 7 | 2 |
| Active | 629900 | 2014-07-15 | 11 | 4 | 7 | 1 |
| Taking Backup Offers | 629900 | 2014-07-11 | 12 | 1 | 11 | 1 |
| Active | 629900 | 2014-06-28 | 13 | 5 | 8 | 1 |
| Taking Backup Offers | 629900 | 2014-06-27 | 14 | 2 | 12 | 1 |
| Active | 629900 | 2014-06-27 | 15 | 6 | 9 | 4 |
| Active | 629900 | 2014-06-23 | 16 | 7 | 9 | 3 |
| Active | 629900 | 2014-06-11 | 17 | 8 | 9 | 2 |
| Active | 629900 | 2014-06-10 | 18 | 9 | 9 | 1 |
| Sold | 570000 | 2010-01-22 | 19 | 2 | 17 | 2 |
| Sold | 288000 | 2000-09-01 | 20 | 3 | 17 | 1 |
+----------------------+--------+------------+-----+-----+--------+----+
最终
+----------------------+--------+------------+
| Descr | Price | dt |
+----------------------+--------+------------+
| Active | 799900 | 2019-02-27 |
| Unknown | 629900 | 2014-09-12 |
| Sold | 625900 | 2014-09-08 |
| Unknown | 629900 | 2014-08-10 |
| Active | 629900 | 2014-07-27 |
| Pending | 629900 | 2014-07-24 |
| Unknown | 629900 | 2014-07-20 |
| Active | 629900 | 2014-07-15 |
| Taking Backup Offers | 629900 | 2014-07-11 |
| Active | 629900 | 2014-06-28 |
| Taking Backup Offers | 629900 | 2014-06-27 |
| Active | 629900 | 2014-06-10 |
| Sold | 288000 | 2000-09-01 |
+----------------------+--------+------------+
如您所见,该结果与第一个结果不同,因为有两行具有相同的日期,并且引擎可以自由地将它们按任何顺序放置。
在此结果中,Active
的日期为2014-06-28
,因为Active
和2014-06-27
恰好位于Taking Backup Offers 2014-06-27
下方。