我有一张包含stop_id
,sched_time
和act_time
的表格,我希望在实际时间填写空白(使用线性插值)基于预定时间(从而保持停止之间的相对时间)。所以我想从这样的事情出发:
stop_id | sched_time | act_time | actual
------------------------------------------------
001 | 13:47:00 | 13:45:00 | TRUE
002 | 13:50:00 | null | FALSE
003 | 13:52:00 | 13:53:00 | TRUE
004 | 13:59:00 | null | FALSE
005 | 14:01:00 | null | FALSE
006 | 14:04:00 | 14:04:00 | TRUE
对于这样的事情:
stop_id | sched_time | act_time
-------------------------------------
001 | 13:47:00 | 13:45:00
002 | 13:50:00 | 13:49:48
003 | 13:52:00 | 13:53:00
004 | 13:59:00 | 13:59:25
005 | 14:01:00 | 14:01:15
006 | 14:04:00 | 14:04:00
如果要求使插值符合停止之间的原始时序太多,act_time
列上的简单线性插值将是一个很好的起点,因为没有太多的可变性。停止之间的时差。
提前致谢!
注意:第一个act_time
可以之前第一个sched_time
,并且可能会有多个连续的行没有实际时间。
答案 0 :(得分:1)
这是一种“第三最佳”解决方案,因为一旦你有实际时间,它就会跟踪你的预定时间或者落后的时间,并将其应用到最近的预定时间而不是实际时间:
with q1 as (
select
t.stop_id, sched_time, act_time,
nvl2(act_time, t.sched_time - t.act_time, null) ahead,
sum (nvl2(act_time, 1, 0)) over
(partition by 1 order by stop_id) as actual_count
from schedule t
)
select
stop_id, sched_time,
act_time,
nvl (act_time, sched_time - min (ahead) over
(partition by actual_count)) as act_time2
from q1
结果与你所追求的完全不符,但它可能是你可以建立的:
STOP_ID SCHED_TIME ACT_TIME ACT_TIME2
001 13:47 13:45 13:45
002 13:50 13:48
003 13:52 13:53 13:53
004 13:59 14:00
005 14:01 14:02
006 14:04 14:04 14:04
- 7/24/14编辑 -
假设您的时间按照建议(30s = 1)转换为整数,我玩了一下。这是一个可怕的解决方案,但我认为它符合你的建议。我不确定它是否比程序循环更快。我很好奇,不管它是不是。 Oracle的分析功能非常棒,但是您可以看到我确实使用它们来执行我认为您所描述的内容:
with q1 as (
select
t.stop_id, t.sched_time, t.act_time,
sum (nvl2(act_time, 1, 0)) over
(partition by 1 order by stop_id) as group_id,
lead (sched_time) over (order by stop_id) as next_sched
from schedule2 t
), q2 as (
select
stop_id, sched_time, act_time, group_id, next_sched,
next_sched - sched_time as elapsed,
row_number() over (partition by group_id order by stop_id) as stops,
min (act_time) over (partition by group_id) as min_time,
min (sched_time) over (partition by group_id) as min_sched
from q1
), q3 as (
select
stop_id, sched_time, act_time, group_id, stops, min_time,
min_sched, next_sched,
sum (elapsed) over (partition by group_id order by stop_id) as elapsed,
max (stops) over (partition by group_id) as grp_stops,
lead (min_time, 1) over (order by stop_id) as next_grp_actual,
lead (min_sched, 1) over (order by stop_id) as next_grp_sched
from q2
), q4 as (
select
stop_id, sched_time, act_time, stops, grp_stops,
min_time, lag (elapsed, 1, 0) over
(partition by group_id order by stop_id) as elapsed,
max (next_grp_sched) over (partition by group_id) - min_sched
as time_btw_sched,
max (next_grp_actual) over (partition by group_id) - min_time
as time_btw_actuals
from q3
)
select
stop_id, sched_time, act_time,
nvl (act_time, min_time + (elapsed / time_btw_sched) *
time_btw_actuals) as act_time2
from q4
以下是我从您的样本中得到的结果:
id sched actual actual (calc)
001 1654 1650 1650
002 1660 1659.6
003 1664 1666 1666
004 1678 1678.83333333333
005 1682 1682.5
006 1688 1688 1688
我认为在编程语言包装器中可以做得更清晰(也更有效)。我只精通C#和Perl,但他们中的任何一个都可以做得很好