在SQL Developer / Oracle 10g中插入缺失值

时间:2014-07-23 14:29:28

标签: sql oracle oracle10g oracle-sqldeveloper

我有一张包含stop_idsched_timeact_time的表格,我希望在实际时间填写空白(使用线性插值)基于预定时间(从而保持停止之间的相对时间)。所以我想从这样的事情出发:

  stop_id  |  sched_time  |  act_time  |  actual
------------------------------------------------
  001      |  13:47:00    |  13:45:00  |  TRUE
  002      |  13:50:00    |  null      |  FALSE
  003      |  13:52:00    |  13:53:00  |  TRUE
  004      |  13:59:00    |  null      |  FALSE
  005      |  14:01:00    |  null      |  FALSE
  006      |  14:04:00    |  14:04:00  |  TRUE

对于这样的事情:

  stop_id  |  sched_time  |  act_time
-------------------------------------
  001      |  13:47:00    |  13:45:00
  002      |  13:50:00    |  13:49:48
  003      |  13:52:00    |  13:53:00
  004      |  13:59:00    |  13:59:25
  005      |  14:01:00    |  14:01:15
  006      |  14:04:00    |  14:04:00

如果要求使插值符合停止之间的原始时序太多,act_time列上的简单线性插值将是一个很好的起点,因为没有太多的可变性。停止之间的时差。

提前致谢!

注意:第一个act_time可以之前第一个sched_time,并且可能会有多个连续的行没有实际时间。

1 个答案:

答案 0 :(得分:1)

这是一种“第三最佳”解决方案,因为一旦你有实际时间,它就会跟踪你的预定时间或者落后的时间,并将其应用到最近的预定时间而不是实际时间:

with q1 as (
  select
    t.stop_id, sched_time, act_time,
    nvl2(act_time, t.sched_time - t.act_time, null) ahead,
    sum (nvl2(act_time, 1, 0)) over
      (partition by 1 order by stop_id) as actual_count
  from schedule t
)
select
  stop_id, sched_time,
  act_time,
  nvl (act_time, sched_time - min (ahead) over
    (partition by actual_count)) as act_time2
from q1

结果与你所追求的完全不符,但它可能是你可以建立的:

STOP_ID   SCHED_TIME  ACT_TIME  ACT_TIME2
001       13:47       13:45     13:45
002       13:50                 13:48
003       13:52       13:53     13:53
004       13:59                 14:00
005       14:01                 14:02
006       14:04       14:04     14:04

- 7/24/14编辑 -

假设您的时间按照建议(30s = 1)转换为整数,我玩了一下。这是一个可怕的解决方案,但我认为它符合你的建议。我不确定它是否比程序循环更快。我很好奇,不管它是不是。 Oracle的分析功能非常棒,但是您可以看到我确实使用它们来执行我认为您所描述的内容:

with q1 as (
  select
    t.stop_id, t.sched_time, t.act_time,
    sum (nvl2(act_time, 1, 0)) over 
        (partition by 1 order by stop_id) as group_id,
    lead (sched_time) over (order by stop_id) as next_sched
  from schedule2 t
), q2 as (
  select
    stop_id, sched_time, act_time, group_id, next_sched,
    next_sched - sched_time as elapsed,
    row_number() over (partition by group_id order by stop_id) as stops,
    min (act_time) over (partition by group_id) as min_time,
    min (sched_time) over (partition by group_id) as min_sched
  from q1
), q3 as (
  select
    stop_id, sched_time, act_time, group_id, stops, min_time,
    min_sched, next_sched,
    sum (elapsed) over (partition by group_id order by stop_id) as elapsed,
    max (stops) over (partition by group_id) as grp_stops,
    lead (min_time, 1) over (order by stop_id) as next_grp_actual,
    lead (min_sched, 1) over (order by stop_id) as next_grp_sched
  from q2
), q4 as (
  select
    stop_id, sched_time, act_time, stops, grp_stops,
    min_time, lag (elapsed, 1, 0) over
      (partition by group_id order by stop_id) as elapsed,
    max (next_grp_sched) over (partition by group_id) - min_sched
        as time_btw_sched,
    max (next_grp_actual) over (partition by group_id) - min_time
        as time_btw_actuals
  from q3
)
select 
  stop_id, sched_time, act_time,
  nvl (act_time, min_time + (elapsed / time_btw_sched) * 
      time_btw_actuals) as act_time2
from q4

以下是我从您的样本中得到的结果:

id     sched   actual  actual (calc)
001    1654    1650    1650
002    1660            1659.6
003    1664    1666    1666
004    1678            1678.83333333333
005    1682            1682.5
006    1688    1688    1688

我认为在编程语言包装器中可以做得更清晰(也更有效)。我只精通C#和Perl,但他们中的任何一个都可以做得很好