按日期顺序查找模式

时间:2013-06-20 07:37:45

标签: sql-server tsql sql-server-2008-r2

我有一个包含3列的表(id(int),date(date),Status(bool))。

像这样

id  date        Status
1   2012-10-18  1
1   2012-10-19  1
1   2012-10-20  0
1   2012-10-21  0
1   2012-10-22  0
1   2012-10-23  0
1   2012-10-24  1
1   2012-10-25  0
1   2012-10-26  0
1   2012-10-27  0
1   2012-10-28  1
2   2012-10-19  0
2   2012-10-20  0
2   2012-10-21  0
2   2012-10-22  1
2   2012-10-23  1

假设日期列是顺序的,日期之间没有差距。

如何查找所有3个连续零(状态列中)及其下一天状态?

像这样

id  startDate     endDate       NextDayStatus
1   2012-10-20    2012-10-22         0
1   2012-10-21    2012-10-23         1
1   2012-10-25    2012-10-27         1
2   2012-10-19    2012-10-21         1

表创建脚本和示例数据

CREATE TABLE [Table1](
    [ID] [smallint] NOT NULL,
    [Date] [date] NOT NULL,
    [Status] [bit] NULL,
 CONSTRAINT [PK_table1] PRIMARY KEY CLUSTERED  (  [ID] ASC,   [Date] ASC ) )

INSERT INTO [Table1]([ID], [Date], [Status])     
SELECT 1, '2012-10-18', 1    UNION ALL
SELECT 1, '2012-10-19', 1    UNION ALL
SELECT 1, '2012-10-20', 0    UNION ALL
SELECT 1, '2012-10-21', 0    UNION ALL
SELECT 1, '2012-10-22', 0    UNION ALL
SELECT 1, '2012-10-23', 0    UNION ALL
SELECT 1, '2012-10-24', 1    UNION ALL 
SELECT 1, '2012-10-25', 0    UNION ALL
SELECT 1, '2012-10-26', 0    UNION ALL
SELECT 1, '2012-10-27', 0    UNION ALL
SELECT 1, '2012-10-28', 1    UNION ALL
SELECT 2, '2012-10-19', 0    UNION ALL
SELECT 2, '2012-10-20', 0    UNION ALL
SELECT 2, '2012-10-21', 0    UNION ALL
SELECT 2, '2012-10-22', 1    UNION ALL
SELECT 2, '2012-10-23', 1

更新:

  • 如果重要,在这一步之后我只需要过滤掉这些日子 这是本月的第一个,第十个或第二十个。
  • 非常感谢Tomalak和gnb,在我的实际任务中,此样本中连续零的数量 9 而不是3,因此使用9个内部联接或交叉应用似乎效率低下

3 个答案:

答案 0 :(得分:4)

编辑,更新ID分区

如果日期不连续,这也适用

SELECT        T1.id, T1.[Date], MAX(X.[Date]), Y.[Status]
FROM     Table1 T1       
   CROSS APPLY
   (  SELECT TOP 3 *
   FROM            Table1 T2
   WHERE           T2.id = T1.id AND T2.[Date] >= T1.Date
   ORDER BY        T2.[Date]
   ) X
   CROSS APPLY
   ( SELECT TOP 4 *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY T3.[Date]) AS rn
   FROM            Table1 T3
   WHERE           T3.id = T1.id AND T3.[Date] >= T1.Date
   ORDER BY        T3.[Date]
   ) Y
WHERE        y.rn = 4
GROUP BY     T1.id, T1.[Date], Y.[Status]
HAVING       SUM(CAST(X.[Status] AS tinyint)) = 0;

为了完整性,这是更优雅的SQL Server 2012解决方案的方式 这可以与任何具有适当窗口/分析支持的RDBMS一起使用

SELECT
    X.id, X.startDate, X.endDate, x.nextStatus
FROM
    ( SELECT        T1.id, T1.[Date] AS startDate,
        LEAD(T1.[Date], 2) OVER (PARTITION BY T1.id ORDER BY T1.[Date]) AS endDate,
        LEAD(T1.[Status], 3) OVER (PARTITION BY T1.id ORDER BY T1.[Date]) AS nextStatus,
        SUM(CAST(T1.[Status] AS tinyint)) OVER (PARTITION BY T1.id ORDER BY T1.[Date] ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) AS SumNext3
    FROM            Table1 T1
    ) X
WHERE        SumNext3 = 0;

答案 1 :(得分:3)

SELECT
  z1.id, z1.[date] AS startDate ,z3.[date] AS endDate, zn.status AS NextDayStatus
FROM 
  Table1 z1
  INNER JOIN Table1 z2 ON z2.[date] = (
    SELECT MIN([date]) FROM Table1 WHERE [date] > z1.[date] AND id = z1.id
  )
  INNER JOIN Table1 z3 ON z3.Date = (
    SELECT MIN([date]) FROM Table1 WHERE [date] > z2.[date] AND id = z1.id
  )
  INNER JOIN Table1 zn ON zn.Date = (
    SELECT MIN([date]) FROM Table1 WHERE [date] > z3.[date] AND id = z1.id
  )
WHERE 
  z1.status = 0
  AND z2.status = 0 AND z2.id = z1.id
  AND z3.status = 0 AND z3.id = z1.id
  AND zn.id = z1.id
ORDER BY
  z1.id, z1.[date]

Table1 (date, status, id)上的索引是最佳的。

答案 2 :(得分:2)

这是另一种解决方案,它也适用于许多SQL产品(支持窗口功能的产品),但特别是在SQL Server 2005及更高版本上:

WITH partitioned AS (
  SELECT
    *,
    grp = DATEDIFF(DAY, 0, Date)
        - ROW_NUMBER() OVER (PARTITION BY ID, Status ORDER BY Date)
  FROM Table1
),
grouped AS (
  SELECT
    ID,
    SD = MIN(Date),
    ED = MAX(Date)
  FROM partitioned
  WHERE Status = 0
  GROUP BY
    ID,
    grp
  HAVING COUNT(*) >= 3
)
SELECT
  t.ID,
  StartDate     = t.Date,
  EndDate       = DATEADD(DAY, 2, t.Date),
  NextDayStatus = CASE t.Date WHEN DATEADD(DAY, -2, g.ED) THEN 1 ELSE 0 END
FROM Table1 t
INNER JOIN grouped g
ON t.ID = g.ID AND t.Date BETWEEN g.SD AND DATEADD(DAY, -2, g.ED)
;

我们的想法是检测Status = 0的所有“孤岛”,挑选那些至少有3行的岛屿,然后将聚合的岛屿集合回到原始表格,以获得符合条件的行的开头。所需的3个连续Status = 0行的子集。

但需要注意的是:此解决方案假定任何3个连续的Status 0行后面至少有一个具有相同ID的其他行。换句话说,最后一组匹配的状态0行应该跟一个状态1行,这就是结果集所指示的行。