我有一张这样的表:
Journey HHMM Chkpt1 Chkpt2
41 1600 AAA BBB
41 1601 AAA BBB
41 1602 AAA BBB
41 1603 CCC DDD
41 1603 BBB CCC
41 1604 DDD EEE
点Chkpt1和Chkpt2定义了一条路段。 在这种情况下,通过这些段的旅程41的车辆:AAA-BBB,BBB-CCC,CCC-DDD,DDD-EEE。
我的问题: 我需要从旅程中获得第一个和最后一个点,以及他们各自的时间。 在这种情况下,答案是AAA(1600)和EEE(1604)。
要想得到这个答案,有几点需要考虑:
1)每分钟跟踪轨迹。这可以生成具有相同检查点的多条线。
2)跟踪每个细分。如果车辆在同一分钟内从一个区段移动到另一个区段,则可以在同一分钟插入多条线路 - 并且出于某些原因,它们可能看起来不是按时间顺序排列。
3)最棘手的一点 - 车辆不一定从Chkpt1转移到Chkpt2。它可能正在从Chkpt2转移到Chkpt1。问题是如何推导出真正的方向(此表上没有方向列,并且不得更改表格。)
例如:
Journey HHMM Chkpt1 Chkpt2
42 1700 YYY ZZZ
42 1701 YYY ZZZ
42 1702 WWW XXX
42 1702 XXX YYY
42 1702 VVV WWW
42 1703 UUU VVV
在这种情况下,车辆从ZZZ移动到UUU,答案是ZZZ(1700)/ UUU(1703)。
在每个细分中,它来自Chkpt2到Chkpt1。 在同一旅程中,必须以相同的方向跟踪所有线路。
对于41号旅程,所有动作都是从Chkpt1到Chkpt2。我们得到了将Chkpt2(1602)与Chkpt1(1603)进行比较的轨迹,因此我们看到车辆从AAA-BBB移动到BBB-CCC,依此类推。
对于旅程42,所有动作都是从Chkpt2到Chkpt1。我们得到了比较Chkpt1(1700)和Chkpt2(1702)的轨迹,因此我们看到车辆从ZZZ-YYY移动到YYY-XXX,依此类推。
期望的结果将是:
Journey ChkptStart Time1 ChkptEnd Time2
41 AAA 1600 EEE 1604
42 ZZZ 1700 UUU 1703
好吧,我没有足够的SQL经验来处理复杂的查询。
有人可以帮我解决这个问题吗?
答案 0 :(得分:0)
处理评论中讨论的一些问题的新版本:
with
-- to_minutes takes the rows in test and changes their HHMM time format
-- to minutes. This makes it easier to compare rows to see if they are
-- 1 minute before or after each other.
to_minutes as
(
select
journey,
(trunc(hhmm/100)*60)+hhmm-(trunc(hhmm/100)*100) mins,
chkpt1,
chkpt2
from
test
),
-- before_same lists rows that have a row that has the same values for
-- chkpt1 and chkpt2 and is from the previous minute.
before_same as
(
select
this_row.journey,
this_row.mins,
this_row.chkpt1,
this_row.chkpt2
from
to_minutes this_row,
to_minutes prev_row
where
this_row.journey = prev_row.journey and
this_row.mins = prev_row.mins+1 and
this_row.chkpt1 = prev_row.chkpt1 and
this_row.chkpt2 = prev_row.chkpt2
),
-- after_same lists rows that have a row that has the same values for
-- chkpt1 and chkpt2 and is from the next minute.
after_same as
(
select
this_row.journey,
this_row.mins,
this_row.chkpt1,
this_row.chkpt2
from
to_minutes this_row,
to_minutes next_row
where
this_row.journey = next_row.journey and
this_row.mins+1 = next_row.mins and
this_row.chkpt1 = next_row.chkpt1 and
this_row.chkpt2 = next_row.chkpt2
),
-- At this point the subqueries are working on chains that go from
-- left to right which means that chkpt1 is the start of the chain or path.
--
-- lr_before_diff lists rows that have a row from the previous minute or the same minute
-- with chkpt2 of that row = chkpt1 of this row.
lr_before_diff as
(
select
this_row.journey,
this_row.mins,
this_row.chkpt1,
this_row.chkpt2
from
to_minutes this_row,
to_minutes prev_row
where
this_row.journey = prev_row.journey and
(this_row.mins = prev_row.mins+1 or
this_row.mins = prev_row.mins) and
this_row.chkpt1 = prev_row.chkpt2
),
-- lr_after_diff lists rows that have a row from the next minute or the same minute
-- with chkpt1 of that row = chkpt2 of this row.
lr_after_diff as
(
select
this_row.journey,
this_row.mins,
this_row.chkpt1,
this_row.chkpt2
from
to_minutes this_row,
to_minutes next_row
where
this_row.journey = next_row.journey and
(this_row.mins+1 = next_row.mins or
this_row.mins = next_row.mins) and
this_row.chkpt2 = next_row.chkpt1
),
-- lr_begin lists the rows that do not have a row before
-- them so they could be start rows for a lr path
lr_begin as
(
select * from to_minutes
minus
(select * from before_same
union
select * from lr_before_diff)
),
-- lr_end lists the rows that do not have a row after
-- them so they could be end rows for a lr path
lr_end as
(
select * from to_minutes
minus
(select * from after_same
union
select * from lr_after_diff)
),
-- lr_beg_count lists number of beginning rows for each journey
-- should be 1 for a lr path
lr_beg_count as
(
select journey,count(*) cnt
from lr_begin
group by journey
),
-- lr_end_count lists number of ending rows for each journey
-- should be 1 for a lr path
lr_end_count as
(
select journey,count(*) cnt
from lr_end
group by journey
),
-- lr_journeys lists the journey numbers of the lr paths
-- only journeys with 1 begin and end row are lr paths
lr_journeys as
(
select lr_beg_count.journey
from lr_beg_count,lr_end_count
where
lr_beg_count.journey = lr_end_count.journey and
lr_beg_count.cnt = 1 and
lr_end_count.cnt = 1
),
-- lr_journey_detail combines the begin and end rows into
-- one row
lr_journey_detail as
(
select
lr_begin.journey,
lr_begin.chkpt1 start_checkpoint,
lr_begin.mins start_mins,
lr_end.chkpt2 end_checkpoint,
lr_end.mins end_mins
from
lr_begin,
lr_end,
lr_journeys
where
lr_begin.journey=lr_end.journey and
lr_journeys.journey=lr_end.journey
),
-- now do the same for right to left paths
rl_before_diff as
(
select
this_row.journey,
this_row.mins,
this_row.chkpt1,
this_row.chkpt2
from
to_minutes this_row,
to_minutes prev_row
where
this_row.journey = prev_row.journey and
(this_row.mins = prev_row.mins+1 or
this_row.mins = prev_row.mins) and
this_row.chkpt2 = prev_row.chkpt1
),
rl_after_diff as
(
select
this_row.journey,
this_row.mins,
this_row.chkpt1,
this_row.chkpt2
from
to_minutes this_row,
to_minutes next_row
where
this_row.journey = next_row.journey and
(this_row.mins+1 = next_row.mins or
this_row.mins = next_row.mins) and
this_row.chkpt1 = next_row.chkpt2
),
rl_begin as
(
select * from to_minutes
minus
(select * from before_same
union
select * from rl_before_diff)
),
rl_end as
(
select * from to_minutes
minus
(select * from after_same
union
select * from rl_after_diff)
),
rl_beg_count as
(
select journey,count(*) cnt
from rl_begin
group by journey
),
rl_end_count as
(
select journey,count(*) cnt
from rl_end
group by journey
),
rl_journeys as
(
select rl_beg_count.journey
from rl_beg_count,rl_end_count
where
rl_beg_count.journey = rl_end_count.journey and
rl_beg_count.cnt = 1 and
rl_end_count.cnt = 1
),
rl_journey_detail as
(
select
rl_begin.journey,
rl_begin.chkpt2 start_checkpoint,
rl_begin.mins start_mins,
rl_end.chkpt1 end_checkpoint,
rl_end.mins end_mins
from
rl_begin,
rl_end,
rl_journeys
where
rl_begin.journey=rl_end.journey and
rl_journeys.journey=rl_end.journey
),
-- now combine the two journey detail rows
journey_detail as
(
select * from lr_journey_detail
union
select * from rl_journey_detail
where journey not in
(select journey from lr_journey_detail)
),
-- convert back to hhmm
convert_hhmm as
(
select
journey,
start_checkpoint,
(trunc(start_mins/60)*100) + start_mins - (trunc(start_mins/60)*60) start_hhmm,
end_checkpoint,
(trunc(end_mins/60)*100) + end_mins - (trunc(end_mins/60)*60) end_hhmm
from journey_detail
)
select * from convert_hhmm
order by journey;
测试数据的输出:
JOURNEY STA START_HHMM END END_HHMM
---------- --- ---------- --- ----------
41 AAA 1600 EEE 1604
42 ZZZ 1700 UUU 1703
使用分层查询可能是更好的解决方案。 https://docs.oracle.com/database/122/SQLRF/Hierarchical-Queries.htm#SQLRF52332
with
-- to_minutes takes the rows in test and changes their HHMM time format
-- to minutes. This makes it easier to compare rows to see if they are
-- 1 minute before or after each other.
to_minutes as
(
select
journey,
(trunc(hhmm/100)*60)+hhmm-(trunc(hhmm/100)*100) mins,
chkpt1,
chkpt2
from
test
),
-- min_max_cnt finds the min and max minute numbers for each journey
-- also counts number of rows in journey
min_max_cnt as
(
select
journey,
min(mins) min_mins,
max(mins) max_mins,
count(*) cnt
from to_minutes
group by journey
),
lr_paths as
(
select
journey,
chkpt1,
chkpt2,
mins,
level lvl,
SYS_CONNECT_BY_PATH(to_char(mins)||'-'||chkpt1||'-'||chkpt2, '/') path
from to_minutes a
start with a.mins =
(select m.min_mins
from min_max_cnt m
where
a.journey = m.journey)
connect by
(
(prior mins + 1 = mins and
prior journey = journey and
prior chkpt1 = chkpt1 and
prior chkpt2 = chkpt2) or
((prior mins + 1 = mins or
prior mins = mins)
and
prior journey = journey and
prior chkpt2 = chkpt1)
)),
max_level_lr as
(
select journey,max(lvl) max_lvl
from lr_paths
group by journey
),
longest_lr_paths as
(
select
l.journey,
l.chkpt1,
l.chkpt2,
l.mins,
l.lvl,
l.path
from lr_paths l,max_level_lr m
where
l.journey = m.journey and
l.lvl = m.max_lvl
),
extract_lr as
(
select
journey,
substr(substr(path,instr(path,'-',1,1)+1,instr(path,'-',1,2)-instr(path,'-',1,1)-1),1,3) start_checkpoint,
substr(substr(path,2,instr(path,'-',1,1)-2),1,4) start_mins,
substr(substr(path,instr(path,'-',-1,1)+1,3),1,3) end_checkpoint,
substr(substr(path,instr(path,'/',-1,1)+1,instr(path,'-',-1,2)-instr(path,'/',-1,1)-1),1,4) end_mins,
lvl path_length
from
longest_lr_paths
),
lr_full_path as
(
select
e.journey,
e.start_checkpoint,
e.start_mins,
e.end_checkpoint,
e.end_mins
from
extract_lr e,
min_max_cnt m
where
e.journey = m.journey and
e.path_length = m.cnt
),
rl_paths as
(
select
journey,
chkpt1,
chkpt2,
mins,
level lvl,
SYS_CONNECT_BY_PATH(to_char(mins)||'-'||chkpt1||'-'||chkpt2, '/') path
from to_minutes a
start with a.mins =
(select m.min_mins
from min_max_cnt m
where
a.journey = m.journey)
connect by
(
(prior mins + 1 = mins and
prior journey = journey and
prior chkpt1 = chkpt1 and
prior chkpt2 = chkpt2) or
((prior mins + 1 = mins or
prior mins = mins)
and
prior journey = journey and
prior chkpt1 = chkpt2)
)),
max_level_rl as
(
select journey,max(lvl) max_lvl
from rl_paths
group by journey
),
longest_rl_paths as
(
select
l.journey,
l.chkpt1,
l.chkpt2,
l.mins,
l.lvl,
l.path
from rl_paths l,max_level_rl m
where
l.journey = m.journey and
l.lvl = m.max_lvl
),
extract_rl as
(
select
journey,
substr(substr(path,instr(path,'-',1,2)+1,3),1,3) start_checkpoint,
substr(substr(path,2,instr(path,'-',1,1)-2),1,4) start_mins,
substr(substr(path,instr(path,'-',-1,2)+1,3),1,3) end_checkpoint,
substr(substr(path,instr(path,'/',-1,1)+1,instr(path,'-',-1,2)-instr(path,'/',-1,1)-1),1,4) end_mins,
lvl path_length
from
longest_rl_paths
),
rl_full_path as
(
select
e.journey,
e.start_checkpoint,
e.start_mins,
e.end_checkpoint,
e.end_mins
from
extract_rl e,
min_max_cnt m
where
e.journey = m.journey and
e.path_length = m.cnt
),
all_paths as
(
select * from lr_full_path
union
select * from rl_full_path
where journey not in
(select journey from lr_full_path)
),
convert_hhmm as
(
select
journey,
start_checkpoint,
(trunc(start_mins/60)*100) + start_mins - (trunc(start_mins/60)*60) start_hhmm,
end_checkpoint,
(trunc(end_mins/60)*100) + end_mins - (trunc(end_mins/60)*60) end_hhmm
from all_paths
)
select * from convert_hhmm
order by
journey;
答案 1 :(得分:0)
是的,这是一个很糟糕的事情(虽然没有我想象的那么糟糕):
WITH Deduplicated AS (SELECT id, checkpoint1, checkpoint2, MIN(hhmm) as startTime, MAX(hhmm) as endTime
FROM Journey
GROUP BY id, checkpoint1, checkpoint2),
Path (id, originPoint, originStartTime, originEndTime, checkpoint2, startTime, endTime, lev)
AS (SELECT id, checkpoint1, startTime, endTime, checkpoint2, startTime, endTime, 0
FROM Deduplicated
WHERE NOT EXISTS (SELECT 1
FROM Journey b
WHERE b.id = Deduplicated.id
AND b.checkpoint2 = Deduplicated.checkpoint1)
UNION ALL
SELECT Path.id, Path.originPoint, Path.originStartTime, Path.originEndTime,
Deduplicated.checkpoint2, Deduplicated.startTime, Deduplicated.endTime, lev + 1
FROM Path
JOIN Deduplicated
ON Deduplicated.id = Path.id
AND Deduplicated.checkpoint1 = Path.checkpoint2)
SELECT id,
CASE WHEN originStartTime > startTime
OR originEndTime > endTime
THEN checkPoint2
ELSE originPoint END AS checkpointStart,
LEAST(originStartTime, startTime) AS time1,
CASE WHEN originStartTime > startTime
OR originEndTime > endTime
THEN originPoint
ELSE checkPoint2 END AS checkpointEnd,
GREATEST(originEndTime, endTime) AS endTime
FROM (SELECT Path.*, MAX(lev) OVER(PARTITION BY id) AS lim
FROM Path) Filtered
WHERE lev = lim
通过轻度病理情况,其中有多个"开始时间"段。本质上,最好的方法是忽略完成递归图(成功)之前的时间,然后检查时间戳的方向是否与递归方向匹配。