使用日期范围对行进行分组

时间:2014-11-23 01:32:24

标签: sql sql-server sql-server-2008

我正在使用SQL Server 2008,需要创建一个显示日期范围内行的查询。

我的表格如下:

ADM_ID    WH_PID     WH_IN_DATETIME    WH_OUT_DATETIME

我的规则是:

  • 如果WH_OUT_DATETIME在具有相同WH_P_ID的另一个ADM_ID的WH_IN_DATETIME的24小时内或之后

我希望在结果中添加另一列,如果可能的话,将标识值标识为EP_ID

e.g。

ADM_ID    WH_PID    WH_IN_DATETIME         WH_OUT_DATETIME
------    ------    --------------         ---------------
1         9         2014-10-12 00:00:00    2014-10-13 15:00:00
2         9         2014-10-14 14:00:00    2014-10-15 15:00:00
3         9         2014-10-16 14:00:00    2014-10-17 15:00:00
4         9         2014-11-20 00:00:00    2014-11-21 00:00:00
5         5         2014-10-17 00:00:00    2014-10-18 00:00:00

将返回行:

ADM_ID   WH_PID   EP_ID   EP_IN_DATETIME        EP_OUT_DATETIME       WH_IN_DATETIME        WH_OUT_DATETIME
------   ------   -----   -------------------   -------------------   -------------------   -------------------
1        9        1       2014-10-12 00:00:00   2014-10-17 15:00:00   2014-10-12 00:00:00   2014-10-13 15:00:00
2        9        1       2014-10-12 00:00:00   2014-10-17 15:00:00   2014-10-14 14:00:00   2014-10-15 15:00:00
3        9        1       2014-10-12 00:00:00   2014-10-17 15:00:00   2014-10-16 14:00:00   2014-10-17 15:00:00
4        9        2       2014-11-20 00:00:00   2014-11-20 00:00:00   2014-10-16 14:00:00   2014-11-21 00:00:00
5        5        1       2014-10-17 00:00:00   2014-10-18 00:00:00   2014-10-17 00:00:00   2014-10-18 00:00:00

EP_OUT_DATETIME将始终是组中的最新日期。希望这有点澄清。 通过这种方式,我可以按EP_ID进行分组,找到EP_OUT_DATETIME并查看其中任何ADM_ID / PID的开始时间。


每个都应该滚动到下一个,这意味着如果另一行的WH_IN_DATETIME在另一行的WH_OUT_DATETIME之后对于相同的WH_PID,则该行的WH_OUT_DATETIME变为该EP_ID内所有WH_PID的EP_OUT_DATETIME。

我希望这是有道理的。

谢谢, MR

5 个答案:

答案 0 :(得分:4)

由于该问题未指明解决方案是单个"查询;-),这是另一种方法:使用" quirky update"功能dealy,它在更新列的同时更新变量。打破这个操作的复杂性,我创建了一个临时表来保存最难计算的部分:EP_ID。完成后,它会加入到一个简单的查询中,并提供用于计算EP_IN_DATETIMEEP_OUT_DATETIME字段的窗口。

步骤如下:

  1. 创建临时表
  2. 使用所有ADM_ID值对临时表进行种子处理 - 这样我们就可以执行更新,因为所有行都已存在。
  3. 更新临时表
  4. 最后,简单选择将临时表连接到主表
  5. 测试设置

    SET ANSI_NULLS ON;
    SET NOCOUNT ON;
    
    CREATE TABLE #Table
    (
       ADM_ID INT NOT NULL PRIMARY KEY,
       WH_PID INT NOT NULL,
       WH_IN_DATETIME DATETIME NOT NULL,
       WH_OUT_DATETIME DATETIME NOT NULL
    );
    
    INSERT INTO #Table VALUES (1, 9, '2014-10-12 00:00:00', '2014-10-13 15:00:00');
    INSERT INTO #Table VALUES (2, 9, '2014-10-14 14:00:00', '2014-10-15 15:00:00');
    INSERT INTO #Table VALUES (3, 9, '2014-10-16 14:00:00', '2014-10-17 15:00:00');
    INSERT INTO #Table VALUES (4, 9, '2014-11-20 00:00:00', '2014-11-21 00:00:00');
    INSERT INTO #Table VALUES (5, 5, '2014-10-17 00:00:00', '2014-10-18 00:00:00');
    

    第1步:创建并填充划痕表

    CREATE TABLE #Scratch
    (
       ADM_ID INT NOT NULL PRIMARY KEY,
       EP_ID INT NOT NULL
       -- Might need WH_PID and WH_IN_DATETIME fields to guarantee proper UPDATE ordering
    );
    
    INSERT INTO #Scratch (ADM_ID, EP_ID)
       SELECT ADM_ID, 0
       FROM   #Table;
    

    备用临时表结构以确保正确的更新顺序(因为"古怪的更新"使用聚集索引的顺序,如本答案底部所述):

    CREATE TABLE #Scratch
    (
       WH_PID INT NOT NULL,
       WH_IN_DATETIME DATETIME NOT NULL,
       ADM_ID INT NOT NULL,
       EP_ID INT NOT NULL
    );
    
    INSERT INTO #Scratch (WH_PID, WH_IN_DATETIME, ADM_ID, EP_ID)
       SELECT WH_PID, WH_IN_DATETIME, ADM_ID, 0
       FROM   #Table;
    
    CREATE UNIQUE CLUSTERED INDEX [CIX_Scratch]
       ON #Scratch (WH_PID, WH_IN_DATETIME, ADM_ID);
    

    步骤2:使用局部变量更新划痕表以跟踪先前值

    DECLARE @EP_ID INT; -- this is used in the UPDATE
    
    ;WITH cte AS
    (
      SELECT TOP (100) PERCENT
             t1.*,
             t2.WH_OUT_DATETIME AS [PriorOut],
             t2.ADM_ID AS [PriorID],
             ROW_NUMBER() OVER (PARTITION BY t1.WH_PID ORDER BY t1.WH_IN_DATETIME)
                    AS [RowNum]
      FROM   #Table t1
      LEFT JOIN #Table t2
             ON t2.WH_PID = t1.WH_PID
            AND t2.ADM_ID <> t1.ADM_ID
            AND t2.WH_OUT_DATETIME >= (t1.WH_IN_DATETIME - 1)
            AND t2.WH_OUT_DATETIME < t1.WH_IN_DATETIME
      ORDER BY t1.WH_PID, t1.WH_IN_DATETIME
    )
    UPDATE sc
    SET    @EP_ID = sc.EP_ID = CASE
                                   WHEN cte.RowNum = 1 THEN 1
                                   WHEN cte.[PriorOut] IS NULL THEN (@EP_ID + 1)
                                   ELSE @EP_ID
                            END
    FROM   #Scratch sc
    INNER JOIN cte
            ON cte.ADM_ID = sc.ADM_ID
    

    步骤3:选择加入划痕表

    SELECT tab.ADM_ID,
           tab.WH_PID,
           sc.EP_ID,
           MIN(tab.WH_IN_DATETIME) OVER (PARTITION BY tab.WH_PID, sc.EP_ID)
               AS [EP_IN_DATETIME],
           MAX(tab.WH_OUT_DATETIME) OVER (PARTITION BY tab.WH_PID, sc.EP_ID)
               AS [EP_OUT_DATETIME],
           tab.WH_IN_DATETIME,
           tab.WH_OUT_DATETIME
    FROM   #Table tab
    INNER JOIN #Scratch sc
        ON sc.ADM_ID = tab.ADM_ID
    ORDER BY tab.ADM_ID;
    

    <强>资源

    • UPDATE

      的MSDN页面

      寻找&#34; @variable = column = expression&#34;

    • Performance Analysis of doing Running Totals(与此不完全相同,但距离不太远)

      这篇博文确实提到:

      • PRO:这种方法通常很快
      • CON:&#34; UPDATE的顺序由聚集索引&#34;的顺序控制。此行为可能会根据具体情况排除使用此方法。但在这种特殊情况下,如果WH_PID值至少不是通过聚集索引的排序自然地组合在一起并按WH_IN_DATETIME排序,那么这两个字段只会被添加到临时表和暂存表上的PK(带有隐含的聚簇索引)变为(WH_PID, WH_IN_DATETIME, ADM_ID)

答案 1 :(得分:3)

我会在相关子查询中使用exists执行此操作:

select t.*,
       (case when exists (select 1
                          from table t2
                          where t2.WH_P_ID = t.WH_P_ID and
                                t2.ADM_ID = t.ADM_ID and
                                t.WH_OUT_DATETIME between t2.WH_IN_DATETIME and dateadd(day, 1, t2.WH_OUT_DATETIME)
                         )
             then 1 else 0
        end) as TimeFrameFlag
from table t;

答案 2 :(得分:3)

尝试此查询:

;WITH cte
     AS (SELECT t1.ADM_ID AS EP_ID,*
         FROM   @yourtable t1
         WHERE  NOT EXISTS (SELECT 1
                            FROM   @yourtable t2
                            WHERE  t1.WH_PID = t2.WH_PID
                                   AND t1.ADM_ID <> t2.ADM_ID
                                   AND Abs(Datediff(HH, t1.WH_OUT_DATETIME, t2.WH_IN_DATETIME)) <= 24)
         UNION ALL
         SELECT t2.EP_ID,t1.ADM_ID,t1.WH_PID,t1.WH_IN_DATETIME,t1.WH_OUT_DATETIME
         FROM   @yourtable t1
                JOIN cte t2
                  ON t1.WH_PID = t2.WH_PID
                     AND t1.ADM_ID <> t2.ADM_ID
                     AND Abs(( Datediff(HH, t2.WH_IN_DATETIME, t1.WH_OUT_DATETIME) )) <= 24),
     cte_result
     AS (SELECT t1.*,Dense_rank() OVER ( partition BY wh_pid ORDER BY t1.WH_PID, ISNULL(t2.EP_ID, t1.ADM_ID)) AS EP_ID
         FROM   @yourtable t1
                LEFT OUTER JOIN (SELECT DISTINCT ADM_ID,
                                                 EP_ID
                                 FROM   cte) t2
                             ON t1.ADM_ID = t2.ADM_ID)
SELECT ADM_ID,WH_PID,EP_ID,Min(WH_IN_DATETIME)OVER(partition BY wh_pid, ep_id) AS [EP_IN_DATETIME],Max(WH_OUT_DATETIME)OVER(partition BY wh_pid, ep_id) AS [EP_OUT_DATETIME],
       WH_IN_DATETIME,
       WH_OUT_DATETIME
FROM   cte_result
ORDER  BY ADM_ID 

我认为这些事情:

  • 遵循您的规则的那些行是group
  • 对于属于该组的所有行,该组的
  • min(WH_IN_DATETIME)将显示在EP_IN_DATETIME列中。同样,对于属于该组的所有行,该组的max(WH_OUT_DATETIME)将显示在EP_IN_DATETIME列中。
  • EP_ID将分别分配给每个WH_PID的群组。
  • 关于第4行的EP_OUT_DATETIMEWH_IN_DATETIME如何分别成为2014-11-20 00:00:002014-10-16 14:00:00,您的问题不合理。假设它是一个拼写错误,它应该是2014-11-21 00:00:00.0002014-11-20 00:00:00.000

解释:

首先CTE cte将根据您的规则返回可能的组。第二次CTE cte_result会将EP_ID分配给群组。最后,您可以在min(WH_IN_DATETIME)的分区中选择Max(WH_OUT_DATETIME)wh_pid, ep_id

sqlfiddle

答案 3 :(得分:2)

这是另一种选择......可能会遗漏你的结果。

我同意@NoDisplayName您的ADM_ID 5输出中似乎有错误,2 OUT日期应该匹配 - 至少这对我来说是合乎逻辑的。我无法理解为什么你想要一个日期值来显示日期值,但当然可能有一个很好的理由。 :)

此外,您的问题的措辞使得听起来这只是问题的一部分,您可以进一步采取此输出。我不确定你的目标是什么,但是我已经将下面的查询打破了2个CTE,你可能会在第二个CTE中找到你的最终信息(因为它听起来像是要将数据重新组合在一起)。

这是完整的结构&amp;查询SQL Fiddle

-- The Cross Join ensures we always have a pair of first and last time pairs
-- The left join matches all overlapping combinations, 
-- allowing the where clause to restrict to just the first and last
-- These first/last pairs are then grouped in the first CTE
-- Data is restricted in the second CTE
-- The final select is then quite simple
With GroupedData AS (
    SELECT
        (Row_Number() OVER (ORDER BY t1.WH_PID, t1.WH_IN_DATETIME) - 1) / 2 Grp,
         t1.WH_IN_DATETIME, t1.WH_OUT_DATETIME, t1.WH_PID
    FROM yourtable t1 
    CROSS JOIN (SELECT 0 AS [First] UNION SELECT 1) SetOrder
    LEFT OUTER JOIN yourtable t2
        ON t1.WH_PID = t2.WH_PID
        AND ((DATEADD(d,1,t1.WH_OUT_DATETIME) BETWEEN t2.WH_IN_DATETIME AND t2.WH_OUT_DATETIME AND [First] = 0)
             OR (DATEADD(d,1,t2.WH_OUT_DATETIME) BETWEEN t1.WH_IN_DATETIME AND t1.WH_OUT_DATETIME AND [First] = 1))
    WHERE t2.WH_PID IS NULL
), RestrictedData AS (
    SELECT WH_PID, MIN(WH_IN_DATETIME) AS WH_IN_DATETIME, MAX(WH_OUT_DATETIME) AS WH_OUT_DATETIME
    FROM GroupedData
    GROUP BY Grp, WH_PID
)
SELECT yourtable.ADM_ID, yourtable.WH_PID, RestrictedData.WH_IN_DATETIME AS EP_IN_DATETIME, RestrictedData.WH_OUT_DATETIME AS EP_OUT_DATETIME, yourtable.WH_IN_DATETIME, yourtable.WH_OUT_DATETIME
FROM RestrictedData
INNER JOIN yourtable
    ON RestrictedData.WH_PID = yourtable.WH_PID
    AND yourtable.WH_IN_DATETIME BETWEEN RestrictedData.WH_IN_DATETIME AND RestrictedData.WH_OUT_DATETIME
ORDER BY yourtable.ADM_ID

答案 4 :(得分:1)

Left Outer JoinDateDiff功能应该可以帮助您过滤记录。最后使用Window Function创建GroupID's

create table #test 
(ADM_ID int,WH_PID int,WH_IN_DATETIME DATETIME,WH_OUT_DATETIME  DATETIME)

INSERT #test
VALUES ( 1,9,'2014-10-12 00:00:00','2014-10-13 15:00:00'),
       (2,9,'2014-10-14 14:00:00','2014-10-15 15:00:00'),
       (3,9,'2014-10-16 14:00:00','2014-10-17 15:00:00'),
       (1,10,'2014-10-16 14:00:00','2014-10-17 15:00:00'),
       (2,10,'2014-10-18 14:00:00','2014-10-19 15:00:00')

SELECT Row_number()OVER(partition by a.WH_PID ORDER BY a.WH_IN_DATETIME) Group_Id,
       a.WH_PID,
       a.WH_IN_DATETIME,
       b.WH_OUT_DATETIME
FROM   #test a
       LEFT JOIN #test b
              ON a.WH_PID = b.WH_PID
                 AND a.ADM_ID <> b.ADM_ID
where  Datediff(hh, a.WH_OUT_DATETIME, b.WH_IN_DATETIME)BETWEEN 0 AND 24 

输出:

Group_Id    WH_PID  WH_IN_DATETIME          WH_OUT_DATETIME
--------    ------  ----------------------- -----------------------
1           9       2014-10-12 00:00:00.000 2014-10-15 15:00:00.000
2           9       2014-10-14 14:00:00.000 2014-10-17 15:00:00.000
1           10      2014-10-16 14:00:00.000 2014-10-19 15:00:00.000