仅选择时差为x小时的行

时间:2013-09-17 15:16:32

标签: sql sql-server sql-server-2008

我有一个表passenger_log,其中包含四列psngr_id(不一定是唯一的),arrival_dt_tmdeparture_dt_tmtimediff(最后一位乘客之间的差异)出发日期和下次抵达日期)。

从这张表中,我希望只能选择时差小于24小时的行,并选择下一行。

我在使用SQL 2008,因此无法真正使用最新版本LEADLAG函数。

现有表格 -

local_passenger_id arrival_date_time departure_date_time timediff
-----------------------------------------------------------------

00F9P0L193  28/07/2013 23:30    28/07/2013 23:36    0.516666
00F9P0L193  29/07/2013 00:07    29/07/2013 03:16    NULL
01MLEMFDUK  12/10/2008 16:43    12/10/2008 20:28    21.116666
01MLEMFDUK  13/10/2008 17:35    13/10/2008 20:19    3889.9
01MLEMFDUK  24/03/2009 22:13    25/03/2009 01:23    32268
01MLEMFDUK  28/11/2012 13:23    28/11/2012 14:10    676.583333
01MLEMFDUK  26/12/2012 18:45    26/12/2012 19:27    20.433333
01MLEMFDUK  27/12/2012 15:53    27/12/2012 17:07    289.2
01MLEMFDUK  08/01/2013 18:19    08/01/2013 19:00    NULL
02CJHH6KAJ  27/02/2011 09:48    27/02/2011 11:56    10167.25
02CJHH6KAJ  26/04/2012 03:11    26/04/2012 06:42    44.566666
02CJHH6KAJ  28/04/2012 03:16    28/04/2012 07:06    23.233333
02CJHH6KAJ  29/04/2012 06:20    29/04/2012 09:45    NULL

期望的输出 -

local_passenger_id arrival_date_time departure_date_time timediff_mins
----------------------------------------------------------------------

00F9P0L193  28/07/2013 23:30    28/07/2013 23:36    0.516666
00F9P0L193  29/07/2013 00:07    29/07/2013 03:16    NULL
01MLEMFDUK  12/10/2008 16:43    12/10/2008 20:28    21.116666
01MLEMFDUK  13/10/2008 17:35    13/10/2008 20:19    3889.9
01MLEMFDUK  26/12/2012 18:45    26/12/2012 19:27    20.433333
01MLEMFDUK  27/12/2012 15:53    27/12/2012 17:07    289.2
02CJHH6KAJ  28/04/2012 03:16    28/04/2012 07:06    23.233333
02CJHH6KAJ  29/04/2012 06:20    29/04/2012 09:45    NULL

2 个答案:

答案 0 :(得分:1)

试试这个

WITH cte(id,pid,adt,ddt,tdif) AS
(SELECT ROW_NUMBER() OVER (ORDER BY pid,adt), pid,adt,ddt,tdif FROM pass),
 hit(id) AS
(SELECT id FROM cte c1 WHERE EXISTS 
  (SELECT 1 FROM cte c2 WHERE c2.id=c1.id+1 AND c2.pid=c1.pid AND c1.tdif<24) )

SELECT pid,adt,ddt,tdif FROM cte WHERE id IN 
  (SELECT id FROM hit UNION SELECT id+1 FROM hit)

基于表格

INSERT INTO pass
    ([pid], [adt], [ddt], [tdif])
VALUES
    ('00F9P0L193', '28.07.2013 23:30', '28.07.2013 23:36', '0.516666'),
    ('00F9P0L193', '29.07.2013 00:07', '29.07.2013 03:16', NULL),
    ('01MLEMFDUK', '12.10.2008 16:43', '12.10.2008 20:28', '21.116666'),
    ('01MLEMFDUK', '13.10.2008 17:35', '13.10.2008 20:19', '3889.9'),
    ('01MLEMFDUK', '24.03.2009 22:13', '25.03.2009 01:23', '32268'),
    ('01MLEMFDUK', '28.11.2012 13:23', '28.11.2012 14:10', '676.583333'),
    ('01MLEMFDUK', '26.12.2012 18:45', '26.12.2012 19:27', '20.433333'),
    ('01MLEMFDUK', '27.12.2012 15:53', '27.12.2012 17:07', '289.2'),
    ('01MLEMFDUK', '08.01.2013 18:19', '08.01.2013 19:00', NULL),
    ('02CJHH6KAJ', '27.02.2011 09:48', '27.02.2011 11:56', '10167.25'),
    ('02CJHH6KAJ', '26.04.2012 03:11', '26.04.2012 06:42', '44.566666'),
    ('02CJHH6KAJ', '28.04.2012 03:16', '28.04.2012 07:06', '23.233333'),
    ('02CJHH6KAJ', '29.04.2012 06:20', '29.04.2012 09:45', NULL)
;

公用表表达式:

  • cte是一个带编号的列表(我在所有行上使用row_number(),无论乘客ID如何)
  • hit是一个只包含 行的表,其中有一行,后面的行与同一名乘客的时差为<24h。

在主select中,我使用带有hit

的SUB SELECT将来自UNION和紧接的后一行的行绑定在一起

输出:

pid,    adt,    ddt,    tdif
00F9P0L193  28.07.2013 23:30    28.07.2013 23:36    0.516666
00F9P0L193  29.07.2013 00:07    29.07.2013 03:16    
01MLEMFDUK  12.10.2008 16:43    12.10.2008 20:28    21.116666
01MLEMFDUK  13.10.2008 17:35    13.10.2008 20:19    3889.9
01MLEMFDUK  26.12.2012 18:45    26.12.2012 19:27    20.433333
01MLEMFDUK  27.12.2012 15:53    27.12.2012 17:07    289.2
02CJHH6KAJ  28.04.2012 03:16    28.04.2012 07:06    23.233333
02CJHH6KAJ  29.04.2012 06:20    29.04.2012 09:45    

只需一个CTE也可以做到这一点:

;WITH cte AS 
(SELECT ROW_NUMBER() OVER (ORDER BY pid,adt) id,pid,adt,ddt,tdif FROM pass)
SELECT pid,adt,ddt,tdif FROM cte c0 WHERE EXISTS (
 SELECT 1 FROM cte c1 WHERE c1.id IN(c0.id,c0.id-1) AND EXISTS 
   (SELECT 1 FROM cte c2 WHERE c2.id=c1.id+1 AND c2.pid=c1.pid AND c1.tdif<24) )

可能不那么容易阅读,但工作方式相同......

答案 1 :(得分:0)

SELECT T.local_passenger_id, T.arrival_date_time, T.departure_date_time, T.timediff FROM passenger_log T
WHERE T.timediff < 24
UNION
SELECT T2.local_passenger_id, T2.arrival_date_time, T2.departure_date_time, T2.timediff FROM 
(SELECT *, RANK() OVER (PARTITION BY local_passenger_id ORDER BY arrival_date_time) AS Ranked FROM passenger_log) T1
LEFT JOIN 
(SELECT *, RANK() OVER (PARTITION BY local_passenger_id ORDER BY arrival_date_time) AS Ranked FROM passenger_log) T2
ON T1.local_passenger_id = T2.local_passenger_id AND T1.Ranked + 1 = T2.Ranked
WHERE T1.timediff < 24

ORDER BY local_passenger_id, arrival_date_time