在n天内过滤掉记录

时间:2018-08-24 22:12:24

标签: sql sql-server tsql

我不确定如何为这个挑战命名。

我想标记(按以后过滤)由TypeID列划分的某些记录,在这些记录中,在第一条记录的日期值的 n 天(在此示例中为3天)内可以看到它们分区数据集。这很简单,但是在相同的分区集中,如果在3天的限制之后之后出现了更多的记录-该组的新“第一”记录应开始一个新链,以标记3天之内的所有后续记录。依此类推。

我已经在此屏幕快照中说明了所需的输出,在这里我想标记/过滤出标有黄色的行。所有其他行都将保留。

enter image description here

我已经为窗口函数等进行了喷雾和祈祷,但是似乎找不到一个完美的解决方案。您将如何使用T-SQL解决这个问题?

sqlfiddle没有响应sql-server atm,因此在此处发布DDL代码:

DROP TABLE IF EXISTS [dbo].[testTable];

CREATE TABLE [dbo].[testTable](
    [RowID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY,
    [CustID] [int] NULL,
    [TransTypeID] [int] NULL,
    [Date] [date] NULL,
)
GO
SET IDENTITY_INSERT [dbo].[testTable] ON 
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (1, 9362, 1, CAST(N'2018-01-11' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (2, 9362, 1, CAST(N'2018-01-22' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (3, 9362, 2, CAST(N'2018-01-04' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (4, 9362, 2, CAST(N'2018-01-07' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (5, 9362, 2, CAST(N'2018-01-09' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (6, 9362, 2, CAST(N'2018-01-22' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (7, 9362, 2, CAST(N'2018-01-23' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (8, 9362, 2, CAST(N'2018-01-24' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (9, 9362, 2, CAST(N'2018-01-26' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (10, 9362, 3, CAST(N'2018-01-22' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (11, 9362, 5, CAST(N'2018-01-01' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (12, 9362, 5, CAST(N'2018-01-02' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (13, 9362, 5, CAST(N'2018-01-02' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (14, 9362, 5, CAST(N'2018-01-04' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (15, 9362, 5, CAST(N'2018-01-07' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (16, 9362, 5, CAST(N'2018-01-17' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (17, 9362, 5, CAST(N'2018-02-08' AS Date))
GO
INSERT [dbo].[testTable] ([RowID], [CustID], [TransTypeID], [Date]) VALUES (18, 9362, 5, CAST(N'2018-02-18' AS Date))
GO
SET IDENTITY_INSERT [dbo].[testTable] OFF
GO

1 个答案:

答案 0 :(得分:1)

使用递归CTE可以做到这一点。首先SELECT组中具有最小日期的所有行。可以使用row_number()完成。然后以递归方式UNION ALL在组中具有最小日期的行,其中日期大于结果中已经存在的最大日期加上3天,从而跳过3天。同样,row_number()可用于此,而dateadd()可用于日期算术。

WITH [cte]
AS
(
SELECT [x].[RowID],
       [x].[CustID],
       [x].[TransTypeId],
       [x].[Date]
       FROM (SELECT [testTable].[RowID],
                    [testTable].[CustID],
                    [testTable].[TransTypeId],
                    [testTable].[Date],
                    row_number() OVER (PARTITION BY [testTable].[CustId],
                                                    [testTable].[TransTypeID]
                                       ORDER BY [testTable].[Date]) [row#]
                    FROM [dbo].[testTable]) [x]
       WHERE [x].[row#] = 1
UNION ALL
SELECT [x].[RowID],
       [x].[CustID],
       [x].[TransTypeId],
       [x].[Date]
       FROM (SELECT [testTable].[RowID],
                    [testTable].[CustID],
                    [testTable].[TransTypeId],
                    [testTable].[Date],
                    row_number() OVER (PARTITION BY [testTable].[CustId],
                                                    [testTable].[TransTypeID]
                                       ORDER BY [testTable].[Date]) [row#]
                    FROM [dbo].[testTable]
                         INNER JOIN [cte]
                                    ON [cte].[CustId] = [testTable].[CustId]
                                       AND [cte].[TransTypeId] = [testTable].[TransTypeID]
                                       AND dateadd(day, 3, [cte].[Date]) < [testTable].[Date]) [x]
       WHERE [x].[row#] = 1
)
SELECT *
       FROM [cte]
       ORDER BY [cte].[CustID],
                [cte].[TransTypeID],
                [cte].[Date];

结果:

RowID | CustID | TransTypeId | Date               
----: | -----: | ----------: | :------------------
    1 |   9362 |           1 | 11/01/2018 00:00:00
    2 |   9362 |           1 | 22/01/2018 00:00:00
    3 |   9362 |           2 | 04/01/2018 00:00:00
    5 |   9362 |           2 | 09/01/2018 00:00:00
    6 |   9362 |           2 | 22/01/2018 00:00:00
    9 |   9362 |           2 | 26/01/2018 00:00:00
   10 |   9362 |           3 | 22/01/2018 00:00:00
   11 |   9362 |           5 | 01/01/2018 00:00:00
   15 |   9362 |           5 | 07/01/2018 00:00:00
   16 |   9362 |           5 | 17/01/2018 00:00:00
   17 |   9362 |           5 | 08/02/2018 00:00:00
   18 |   9362 |           5 | 18/02/2018 00:00:00

db<>fiddle

(我假设这些组不仅由[TransTypeID]定义,而且还由[CustID]定义。这对我来说真的不是很清楚。如果我的假设是错误的,请从{中删除[CustID] {1}}子句。)