在SQL中计算连续序列的块

时间:2018-08-15 17:10:38

标签: sql oracle11g

让我们假设这种情况:

CAR     TIME
A       1300
A       1301
A       1302
A       1315
A       1316
A       1317
A       1319
A       1320
B       1321
B       1322

我想生成另一列,枚举每辆车的每次旅行。 我们认为每次在TIME遇到不连续都会有新的旅程。

CAR     TIME    TRIP
A       1300     1
A       1301     1
A       1302     1
A       1315     2
A       1316     2
A       1317     2
A       1319     3
A       1320     3
B       1321     1
B       1322     1

是否有一些SQL函数来获取此计数? 预先感谢。

5 个答案:

答案 0 :(得分:6)

您似乎想要累积方法:

select t.*, dense_rank() over (partition by car order by grp1) as trp
from (select t.*, sum(case when grp > 1 then 1 else 0 end) over (partition by car order by time) as grp1
      from (select t.*, coalesce((time - lag(time) over (partition by car order by time)), 1) as grp
            from table t
           ) t
     ) t;

答案 1 :(得分:3)

这是我解决这个问题的方法:

with grp as (
  select row_number() over (partition by CAR order by TIME) rn, a.CAR, a.TIME
  from test a
  where not exists (select * from test b
                    where a.CAR=b.CAR 
                    and to_date(b.TIME, 'YYYYmmDDHH24MI')+1/(24*60) = to_date(a.TIME, 'YYYYmmDDHH24MI'))
)
select t.CAR, t.TIME, (
  select max(rn) from grp where t.CAR=grp.CAR and grp.TIME <= t.TIME
) as trip
from test t

主要思想是为每次旅行选择开始时间(此操作在CTE grp中完成),然后将行号用作旅行标识符

小提琴http://sqlfiddle.com/#!4/6a327/10

答案 2 :(得分:3)

我会使用row_number()。 。 。和-来定义组。然后,dense_rank()

select t.*,
       dense_rank() over (partition by car order by time - seqnum) as trip
from (select t.*, row_number() over (partition by car order by time) as seqnum
      from t
     ) t;

我无法轻易想到使用少于2个窗口函数的任何替代方法,或者使用joingroup by可能更快。

答案 3 :(得分:1)

另一种方法:

SELECT t.car, t.time, MIN(t3.time)
  FROM test t, test t3
 WHERE NOT EXISTS (SELECT 1
                     FROM test t2
                    WHERE t2.car = t.car
                      AND t2.time = t.time - 1)
   AND t3.car = t.car
   AND t3.time >= t.time
   AND NOT EXISTS (SELECT 1
                     FROM test t4
                    WHERE t4.car = t3.car
                      AND t4.time = t3.time + 1)
 GROUP BY t.car, t.time
 ORDER BY 1, 2;

第一个不存在的行会找到在前一分钟没有同一辆车的行的所有行-也就是说,这些行开始一段汽车的行。

后来的不存在项将获得同一行没有下一行的一组行-即以句点结尾的行。 max函数会找到其中的最小值(也被过滤为大于或等于相关周期的开始。

答案 4 :(得分:1)

结合其他一些想法,包括跨越一个小时边界但未转换为日期的旅行(以防万一,这显着减慢了速度),并允许同一次旅行重复多次:

-- CTE for sample data
with your_table (car, time) as (
            select 'A', 201808151259 from dual -- extra row to go across hour
  union all select 'A', 201808151300 from dual
  union all select 'A', 201808151301 from dual
  union all select 'A', 201808151302 from dual
  union all select 'A', 201808151315 from dual
  union all select 'A', 201808151316 from dual
  union all select 'A', 201808151317 from dual
  union all select 'A', 201808151319 from dual
  union all select 'A', 201808151319 from dual -- extra row for duplicate time
  union all select 'A', 201808151320 from dual
  union all select 'B', 201808151321 from dual
  union all select 'B', 201808151322 from dual
)
-- actual query
select car,
  time,
  dense_rank() over (partition by car order by trip_start) as trip
from (
  select car,
    time,
    max(case when lag_time = time
               or lag_time = time - case when mod(time, 100) = 00 then 41 else 1 end
             then null else time end
    ) over (partition by car order by time) as trip_start
  from (
    select car,
      time,
      lag(time) over (partition by car order by time) as lag_time
    from your_table
 )
)
order by car, time;

得到

CAR         TIME         TRIP
--- ------------ ------------
A   201808151259            1
A   201808151300            1
A   201808151301            1
A   201808151302            1
A   201808151315            2
A   201808151316            2
A   201808151317            2
A   201808151319            3
A   201808151319            3
A   201808151320            3
B   201808151321            1
B   201808151322            1

最里面的查询仅使用lag()获取每行的原始数据和前一个时间值。

下一个查询通过将重复的和相邻的时间(包括一个小时的边界,通过嵌套的case表达式)视为空值,然后找到到目前为止的最大值(忽略了刚刚生成的空值)来找到行程默认。所有连续的时间结束都具有相同的跳闸开始时间:

select car,
  time,
  max(case when lag_time = time
             or lag_time = time - case when mod(time, 100) = 00 then 41 else 1 end
           then null else time end
  ) over (partition by car order by time) as trip_start
from (
  select car,
    time,
    lag(time) over (partition by car order by time) as lag_time
  from your_table
)
order by car, time;

CAR         TIME   TRIP_START
--- ------------ ------------
A   201808151259 201808151259
A   201808151300 201808151259
A   201808151301 201808151259
A   201808151302 201808151259
A   201808151315 201808151315
A   201808151316 201808151315
A   201808151317 201808151315
A   201808151319 201808151319
A   201808151319 201808151319
A   201808151320 201808151319
B   201808151321 201808151321
B   201808151322 201808151321

然后,最外面的查询使用dense_rank()根据行程的开始时间对行程进行连续编号。