假设我有一个带有时间/值对数组的Ruby数组,例如:
[
# about 9:00 AM on consecutive days
[<DateTime: 2014-05-15T09:00:00Z>, 56],
[<DateTime: 2014-05-16T09:06:00Z>, 57],
# ... missing data for May 17th, 2014
# ... missing data for May 18th, 2014
[<DateTime: 2014-05-19T08:57:00Z>, 61],
# ...
]
请注意:(1)每天不会在同一时间收集值,(2)缺少某些值。
我希望通过以下方式规范化数据:
以编程方式执行此操作的正确方法是什么?
你想如何插值?在你的例子中[58,59],[58,60]和[59,60]同样合理。
预期值取决于所使用的插值策略(例如,线性,二次等),因此我无法提供准确的答案。
我愿意接受任何以最小误差(例如<0.1%)预测原始实际数据点的插值策略。我愿意接受任何导致时间序列观测间隔相等的归一化策略。
答案 0 :(得分:5)
您可以使用样条插值。以下是使用Spliner gem:
的示例require 'date'
require 'spliner'
arr = [
[DateTime.new(2014,5,15,9), 56],
[DateTime.new(2014,5,16,9,6), 57],
[DateTime.new(2014,5,19,8,57), 61]
]
spline = Spliner::Spliner.new(arr.to_h, extrapolate: '10%')
(DateTime.new(2014,5,15,9)..DateTime.new(2014,5,19,9)).each do |date|
puts "#{date}: #{spline[date]}"
end
输出:
2014-05-15T09:00:00+00:00: 56.0 # exact value
2014-05-16T09:00:00+00:00: 56.995496729398646 # interpolated value
2014-05-17T09:00:00+00:00: 58.18937752978536 # interpolated value
2014-05-18T09:00:00+00:00: 59.55365781173006 # interpolated value
2014-05-19T09:00:00+00:00: 61.0030489943531 # extrapolated value
答案 1 :(得分:1)
a
是第一个,b
是当前数组的最后一个元素。这个解决方案就像98%那样,你要做的最后一件事就是从最后一个日期开始添加/删除几分钟,使其成为上午9点(所以纠正days
和days.round
之间的差异)
days = b.first - a.first
per_day = (b.last - a.last) / days
days.round.times.map.with_index(1).inject([a]) do |arr,(_,i)|
arr << [a.first + i, (a.last + i * per_days).to_f]
arr
end
#=> [[#<DateTime: 2014-05-15T09:00:00+00:00 ((2456793j,32400s,0n),+0s,2299161j)>, 56],
[#<DateTime: 2014-05-16T09:00:00+00:00 ((2456794j,32400s,0n),+0s,2299161j)>, 57.250651380927565],
[#<DateTime: 2014-05-17T09:00:00+00:00 ((2456795j,32400s,0n),+0s,2299161j)>, 58.501302761855136],
[#<DateTime: 2014-05-18T09:00:00+00:00 ((2456796j,32400s,0n),+0s,2299161j)>, 59.7519541427827],
[#<DateTime: 2014-05-19T09:00:00+00:00 ((2456797j,32400s,0n),+0s,2299161j)>, 61.002605523710265]]