Question

我有一张这样的表：

Journey  HHMM   Chkpt1  Chkpt2
41   1600   AAA BBB 
41   1601   AAA BBB
41   1602   AAA BBB
41   1603   CCC DDD
41   1603   BBB CCC
41   1604   DDD EEE

点Chkpt1和Chkpt2定义了一条路段。在这种情况下，通过这些段的旅程41的车辆：AAA-BBB，BBB-CCC，CCC-DDD，DDD-EEE。

我的问题：我需要从旅程中获得第一个和最后一个点，以及他们各自的时间。在这种情况下，答案是AAA（1600）和EEE（1604）。

要想得到这个答案，有几点需要考虑：

1）每分钟跟踪轨迹。这可以生成具有相同检查点的多条线。

2）跟踪每个细分。如果车辆在同一分钟内从一个区段移动到另一个区段，则可以在同一分钟插入多条线路 - 并且出于某些原因，它们可能看起来不是按时间顺序排列。

3）最棘手的一点 - 车辆不一定从Chkpt1转移到Chkpt2。它可能正在从Chkpt2转移到Chkpt1。问题是如何推导出真正的方向（此表上没有方向列，并且不得更改表格。）

例如：

Journey HHMM    Chkpt1  Chkpt2 
42  1700    YYY ZZZ 
42  1701    YYY ZZZ 
42  1702    WWW XXX 
42  1702    XXX YYY 
42  1702    VVV WWW 
42  1703    UUU VVV

在这种情况下，车辆从ZZZ移动到UUU，答案是ZZZ（1700）/ UUU（1703）。

在每个细分中，它来自Chkpt2到Chkpt1。在同一旅程中，必须以相同的方向跟踪所有线路。

对于41号旅程，所有动作都是从Chkpt1到Chkpt2。我们得到了将Chkpt2（1602）与Chkpt1（1603）进行比较的轨迹，因此我们看到车辆从AAA-BBB移动到BBB-CCC，依此类推。

对于旅程42，所有动作都是从Chkpt2到Chkpt1。我们得到了比较Chkpt1（1700）和Chkpt2（1702）的轨迹，因此我们看到车辆从ZZZ-YYY移动到YYY-XXX，依此类推。

期望的结果将是：

Journey ChkptStart  Time1   ChkptEnd    Time2 
41  AAA 1600    EEE 1604 
42  ZZZ 1700    UUU 1703

好吧，我没有足够的SQL经验来处理复杂的查询。

有人可以帮我解决这个问题吗？

Answer 1

处理评论中讨论的一些问题的新版本：

with 
-- to_minutes takes the rows in test and changes their HHMM time format 
-- to minutes. This makes it easier to compare rows to see if they are 
-- 1 minute before or after each other.
to_minutes as
(
select 
journey,
(trunc(hhmm/100)*60)+hhmm-(trunc(hhmm/100)*100) mins,
chkpt1,
chkpt2
from 
test
),
-- before_same lists rows that have a row that has the same values for
-- chkpt1 and chkpt2 and is from the previous minute.
before_same as
(
select 
this_row.journey,
this_row.mins,
this_row.chkpt1,
this_row.chkpt2
from
to_minutes this_row,
to_minutes prev_row
where
this_row.journey = prev_row.journey and
this_row.mins = prev_row.mins+1 and
this_row.chkpt1 = prev_row.chkpt1 and
this_row.chkpt2 = prev_row.chkpt2
),
-- after_same lists rows that have a row that has the same values for
-- chkpt1 and chkpt2 and is from the next minute.
after_same as
(
select 
this_row.journey,
this_row.mins,
this_row.chkpt1,
this_row.chkpt2
from
to_minutes this_row,
to_minutes next_row
where
this_row.journey = next_row.journey and
this_row.mins+1 = next_row.mins and
this_row.chkpt1 = next_row.chkpt1 and
this_row.chkpt2 = next_row.chkpt2
),
-- At this point the subqueries are working on chains that go from
-- left to right which means that chkpt1 is the start of the chain or path.
--
-- lr_before_diff lists rows that have a row from the previous minute or the same minute
-- with chkpt2 of that row = chkpt1 of this row.
lr_before_diff as
(
select 
this_row.journey,
this_row.mins,
this_row.chkpt1,
this_row.chkpt2
from
to_minutes this_row,
to_minutes prev_row
where
this_row.journey = prev_row.journey and
(this_row.mins = prev_row.mins+1 or
this_row.mins = prev_row.mins) and
this_row.chkpt1 = prev_row.chkpt2
),
-- lr_after_diff lists rows that have a row from the next minute or the same minute
-- with chkpt1 of that row = chkpt2 of this row.
lr_after_diff as
(
select 
this_row.journey,
this_row.mins,
this_row.chkpt1,
this_row.chkpt2
from
to_minutes this_row,
to_minutes next_row
where
this_row.journey = next_row.journey and
(this_row.mins+1 = next_row.mins or
this_row.mins = next_row.mins) and
this_row.chkpt2 = next_row.chkpt1
),
-- lr_begin lists the rows that do not have a row before
-- them so they could be start rows for a lr path
lr_begin as
(
select * from to_minutes
minus
(select * from before_same
union
select * from lr_before_diff)
),
-- lr_end lists the rows that do not have a row after
-- them so they could be end rows for a lr path
lr_end as
(
select * from to_minutes
minus
(select * from after_same
union
select * from lr_after_diff)
),
-- lr_beg_count lists number of beginning rows for each journey
-- should be 1 for a lr path
lr_beg_count as
(
select journey,count(*) cnt
from lr_begin
group by journey
),
-- lr_end_count lists number of ending rows for each journey
-- should be 1 for a lr path
lr_end_count as
(
select journey,count(*) cnt
from lr_end
group by journey
),
-- lr_journeys lists the journey numbers of the lr paths
-- only journeys with 1 begin and end row are lr paths
lr_journeys as
(
select lr_beg_count.journey
from lr_beg_count,lr_end_count
where
lr_beg_count.journey = lr_end_count.journey and
lr_beg_count.cnt = 1 and
lr_end_count.cnt = 1
),
-- lr_journey_detail combines the begin and end rows into 
-- one row
lr_journey_detail as
(
select
lr_begin.journey,
lr_begin.chkpt1 start_checkpoint,
lr_begin.mins start_mins,
lr_end.chkpt2 end_checkpoint,
lr_end.mins end_mins
from
lr_begin,
lr_end,
lr_journeys
where
lr_begin.journey=lr_end.journey and
lr_journeys.journey=lr_end.journey
),
-- now do the same for right to left paths
rl_before_diff as
(
select 
this_row.journey,
this_row.mins,
this_row.chkpt1,
this_row.chkpt2
from
to_minutes this_row,
to_minutes prev_row
where
this_row.journey = prev_row.journey and
(this_row.mins = prev_row.mins+1 or
this_row.mins = prev_row.mins) and
this_row.chkpt2 = prev_row.chkpt1
),
rl_after_diff as
(
select 
this_row.journey,
this_row.mins,
this_row.chkpt1,
this_row.chkpt2
from
to_minutes this_row,
to_minutes next_row
where
this_row.journey = next_row.journey and
(this_row.mins+1 = next_row.mins or
this_row.mins = next_row.mins) and
this_row.chkpt1 = next_row.chkpt2
),
rl_begin as
(
select * from to_minutes
minus
(select * from before_same
union
select * from rl_before_diff)
),
rl_end as
(
select * from to_minutes
minus
(select * from after_same
union
select * from rl_after_diff)
),
rl_beg_count as
(
select journey,count(*) cnt
from rl_begin
group by journey
),
rl_end_count as
(
select journey,count(*) cnt
from rl_end
group by journey
),
rl_journeys as
(
select rl_beg_count.journey
from rl_beg_count,rl_end_count
where
rl_beg_count.journey = rl_end_count.journey and
rl_beg_count.cnt = 1 and
rl_end_count.cnt = 1
),
rl_journey_detail as
(
select
rl_begin.journey,
rl_begin.chkpt2 start_checkpoint,
rl_begin.mins start_mins,
rl_end.chkpt1 end_checkpoint,
rl_end.mins end_mins
from
rl_begin,
rl_end,
rl_journeys
where
rl_begin.journey=rl_end.journey and
rl_journeys.journey=rl_end.journey
),
-- now combine the two journey detail rows
journey_detail as
(
select * from lr_journey_detail 
union
select * from rl_journey_detail 
where journey not in
(select journey from lr_journey_detail)
),
-- convert back to hhmm
convert_hhmm as
(
select
journey,
start_checkpoint,
(trunc(start_mins/60)*100) + start_mins - (trunc(start_mins/60)*60) start_hhmm,
end_checkpoint,
(trunc(end_mins/60)*100) + end_mins - (trunc(end_mins/60)*60) end_hhmm
from journey_detail
)
select * from convert_hhmm 
order by journey;

测试数据的输出：

   JOURNEY STA START_HHMM END   END_HHMM
---------- --- ---------- --- ----------
        41 AAA       1600 EEE       1604
        42 ZZZ       1700 UUU       1703

使用分层查询可能是更好的解决方案。 https://docs.oracle.com/database/122/SQLRF/Hierarchical-Queries.htm#SQLRF52332

with 
-- to_minutes takes the rows in test and changes their HHMM time format 
-- to minutes. This makes it easier to compare rows to see if they are 
-- 1 minute before or after each other.
to_minutes as
(
select 
journey,
(trunc(hhmm/100)*60)+hhmm-(trunc(hhmm/100)*100) mins,
chkpt1,
chkpt2
from 
test
),
-- min_max_cnt finds the min and max minute numbers for each journey
-- also counts number of rows in journey
min_max_cnt as
(
select 
journey,
min(mins) min_mins,
max(mins) max_mins,
count(*) cnt
from to_minutes
group by journey
),
lr_paths as
(
select
journey,
chkpt1,
chkpt2,
mins,
level lvl,
SYS_CONNECT_BY_PATH(to_char(mins)||'-'||chkpt1||'-'||chkpt2, '/') path
from to_minutes a
start with a.mins =
(select m.min_mins
from min_max_cnt m
where
a.journey = m.journey)
connect by
(
(prior mins + 1 = mins and
prior journey = journey and
prior chkpt1 = chkpt1 and
prior chkpt2 = chkpt2) or
((prior mins + 1 = mins or 
  prior mins = mins)
and
prior journey = journey and
prior chkpt2 = chkpt1)
)),
max_level_lr as
(
select journey,max(lvl) max_lvl
from lr_paths
group by journey
),
longest_lr_paths as 
(
select 
l.journey,
l.chkpt1,
l.chkpt2,
l.mins,
l.lvl,
l.path
from lr_paths l,max_level_lr m
where
l.journey = m.journey and
l.lvl = m.max_lvl
),
extract_lr as
(
select
journey,
substr(substr(path,instr(path,'-',1,1)+1,instr(path,'-',1,2)-instr(path,'-',1,1)-1),1,3) start_checkpoint,
substr(substr(path,2,instr(path,'-',1,1)-2),1,4) start_mins,
substr(substr(path,instr(path,'-',-1,1)+1,3),1,3) end_checkpoint,
substr(substr(path,instr(path,'/',-1,1)+1,instr(path,'-',-1,2)-instr(path,'/',-1,1)-1),1,4) end_mins,
lvl path_length
from 
longest_lr_paths
),
lr_full_path as
(
select
e.journey,
e.start_checkpoint,
e.start_mins,
e.end_checkpoint,
e.end_mins
from 
extract_lr e,
min_max_cnt m
where 
e.journey = m.journey and
e.path_length = m.cnt
),
rl_paths as
(
select
journey,
chkpt1,
chkpt2,
mins,
level lvl,
SYS_CONNECT_BY_PATH(to_char(mins)||'-'||chkpt1||'-'||chkpt2, '/') path
from to_minutes a
start with a.mins =
(select m.min_mins
from min_max_cnt m
where
a.journey = m.journey)
connect by
(
(prior mins + 1 = mins and
prior journey = journey and
prior chkpt1 = chkpt1 and
prior chkpt2 = chkpt2) or
((prior mins + 1 = mins or 
  prior mins = mins)
and
prior journey = journey and
prior chkpt1 = chkpt2)
)),
max_level_rl as
(
select journey,max(lvl) max_lvl
from rl_paths
group by journey
),
longest_rl_paths as 
(
select 
l.journey,
l.chkpt1,
l.chkpt2,
l.mins,
l.lvl,
l.path
from rl_paths l,max_level_rl m
where
l.journey = m.journey and
l.lvl = m.max_lvl
),
extract_rl as
(
select
journey,
substr(substr(path,instr(path,'-',1,2)+1,3),1,3) start_checkpoint,
substr(substr(path,2,instr(path,'-',1,1)-2),1,4) start_mins,
substr(substr(path,instr(path,'-',-1,2)+1,3),1,3) end_checkpoint,
substr(substr(path,instr(path,'/',-1,1)+1,instr(path,'-',-1,2)-instr(path,'/',-1,1)-1),1,4) end_mins,
lvl path_length
from 
longest_rl_paths
),
rl_full_path as
(
select
e.journey,
e.start_checkpoint,
e.start_mins,
e.end_checkpoint,
e.end_mins
from 
extract_rl e,
min_max_cnt m
where 
e.journey = m.journey and
e.path_length = m.cnt
),
all_paths as
(
select * from lr_full_path 
union
select * from rl_full_path 
where journey not in
(select journey from lr_full_path)
),
convert_hhmm as
(
select
journey,
start_checkpoint,
(trunc(start_mins/60)*100) + start_mins - (trunc(start_mins/60)*60) start_hhmm,
end_checkpoint,
(trunc(end_mins/60)*100) + end_mins - (trunc(end_mins/60)*60) end_hhmm
from all_paths
)
select * from convert_hhmm 
order by
journey;

Answer 2

是的，这是一个很糟糕的事情（虽然没有我想象的那么糟糕）：

WITH Deduplicated AS (SELECT id, checkpoint1, checkpoint2, MIN(hhmm) as startTime, MAX(hhmm) as endTime
                      FROM Journey
                      GROUP BY id, checkpoint1, checkpoint2),
     Path (id, originPoint, originStartTime, originEndTime, checkpoint2, startTime, endTime, lev)
          AS (SELECT id, checkpoint1, startTime, endTime, checkpoint2, startTime, endTime, 0
              FROM Deduplicated
              WHERE NOT EXISTS (SELECT 1
                                FROM Journey b
                                WHERE b.id = Deduplicated.id
                                      AND b.checkpoint2 = Deduplicated.checkpoint1)
              UNION ALL
              SELECT Path.id, Path.originPoint, Path.originStartTime, Path.originEndTime,
                     Deduplicated.checkpoint2, Deduplicated.startTime, Deduplicated.endTime, lev + 1
              FROM Path
              JOIN Deduplicated
                ON Deduplicated.id = Path.id
                   AND Deduplicated.checkpoint1 = Path.checkpoint2)
SELECT id, 
       CASE WHEN originStartTime > startTime 
                 OR originEndTime > endTime
            THEN checkPoint2
            ELSE originPoint END AS checkpointStart, 
       LEAST(originStartTime, startTime) AS time1,
       CASE WHEN originStartTime > startTime 
                 OR originEndTime > endTime
            THEN originPoint
            ELSE checkPoint2 END AS checkpointEnd, 
       GREATEST(originEndTime, endTime) AS endTime
FROM (SELECT Path.*, MAX(lev) OVER(PARTITION BY id) AS lim
      FROM Path) Filtered
WHERE lev = lim

Fiddle Demo

通过轻度病理情况，其中有多个＆＃34;开始时间＆＃34;段。本质上，最好的方法是忽略完成递归图（成功）之前的时间，然后检查时间戳的方向是否与递归方向匹配。

为其中字段From和To放置在可逆列

2 个答案: