Question

我有一张包含stop_id，sched_time和act_time的表格，我希望在实际时间填写空白（使用线性插值）基于预定时间（从而保持停止之间的相对时间）。所以我想从这样的事情出发：

  stop_id  |  sched_time  |  act_time  |  actual
------------------------------------------------
  001      |  13:47:00    |  13:45:00  |  TRUE
  002      |  13:50:00    |  null      |  FALSE
  003      |  13:52:00    |  13:53:00  |  TRUE
  004      |  13:59:00    |  null      |  FALSE
  005      |  14:01:00    |  null      |  FALSE
  006      |  14:04:00    |  14:04:00  |  TRUE

对于这样的事情：

  stop_id  |  sched_time  |  act_time
-------------------------------------
  001      |  13:47:00    |  13:45:00
  002      |  13:50:00    |  13:49:48
  003      |  13:52:00    |  13:53:00
  004      |  13:59:00    |  13:59:25
  005      |  14:01:00    |  14:01:15
  006      |  14:04:00    |  14:04:00

如果要求使插值符合停止之间的原始时序太多，act_time列上的简单线性插值将是一个很好的起点，因为没有太多的可变性。停止之间的时差。

提前致谢！

注意：第一个act_time可以之前第一个sched_time，并且可能会有多个连续的行没有实际时间。

Answer 1

这是一种“第三最佳”解决方案，因为一旦你有实际时间，它就会跟踪你的预定时间或者落后的时间，并将其应用到最近的预定时间而不是实际时间：

with q1 as (
  select
    t.stop_id, sched_time, act_time,
    nvl2(act_time, t.sched_time - t.act_time, null) ahead,
    sum (nvl2(act_time, 1, 0)) over
      (partition by 1 order by stop_id) as actual_count
  from schedule t
)
select
  stop_id, sched_time,
  act_time,
  nvl (act_time, sched_time - min (ahead) over
    (partition by actual_count)) as act_time2
from q1

结果与你所追求的完全不符，但它可能是你可以建立的：

STOP_ID   SCHED_TIME  ACT_TIME  ACT_TIME2
001       13:47       13:45     13:45
002       13:50                 13:48
003       13:52       13:53     13:53
004       13:59                 14:00
005       14:01                 14:02
006       14:04       14:04     14:04

- 7/24/14编辑 -

假设您的时间按照建议（30s = 1）转换为整数，我玩了一下。这是一个可怕的解决方案，但我认为它符合你的建议。我不确定它是否比程序循环更快。我很好奇，不管它是不是。 Oracle的分析功能非常棒，但是您可以看到我确实使用它们来执行我认为您所描述的内容：

with q1 as (
  select
    t.stop_id, t.sched_time, t.act_time,
    sum (nvl2(act_time, 1, 0)) over 
        (partition by 1 order by stop_id) as group_id,
    lead (sched_time) over (order by stop_id) as next_sched
  from schedule2 t
), q2 as (
  select
    stop_id, sched_time, act_time, group_id, next_sched,
    next_sched - sched_time as elapsed,
    row_number() over (partition by group_id order by stop_id) as stops,
    min (act_time) over (partition by group_id) as min_time,
    min (sched_time) over (partition by group_id) as min_sched
  from q1
), q3 as (
  select
    stop_id, sched_time, act_time, group_id, stops, min_time,
    min_sched, next_sched,
    sum (elapsed) over (partition by group_id order by stop_id) as elapsed,
    max (stops) over (partition by group_id) as grp_stops,
    lead (min_time, 1) over (order by stop_id) as next_grp_actual,
    lead (min_sched, 1) over (order by stop_id) as next_grp_sched
  from q2
), q4 as (
  select
    stop_id, sched_time, act_time, stops, grp_stops,
    min_time, lag (elapsed, 1, 0) over
      (partition by group_id order by stop_id) as elapsed,
    max (next_grp_sched) over (partition by group_id) - min_sched
        as time_btw_sched,
    max (next_grp_actual) over (partition by group_id) - min_time
        as time_btw_actuals
  from q3
)
select 
  stop_id, sched_time, act_time,
  nvl (act_time, min_time + (elapsed / time_btw_sched) * 
      time_btw_actuals) as act_time2
from q4

以下是我从您的样本中得到的结果：

id     sched   actual  actual (calc)
001    1654    1650    1650
002    1660            1659.6
003    1664    1666    1666
004    1678            1678.83333333333
005    1682            1682.5
006    1688    1688    1688

我认为在编程语言包装器中可以做得更清晰（也更有效）。我只精通C＃和Perl，但他们中的任何一个都可以做得很好

在SQL Developer / Oracle 10g中插入缺失值

1 个答案: