Python时间序列推断

时间:2017-03-29 01:23:16

标签: python

我的时间序列如下:

timeseries1 = [{'price': 250, 'time': 1.52},
    {'price': 251, 'time': 3.65},
    {'price': 253, 'time': 10.1},
    {'price': 254, 'time': 10.99}]

我希望能够插入这些数据,使其在较小的时间步长中向前移动,并具有以下内容:

timeStep = 0.1
timeseries2 = [{'price': 250, 'time': 1.5},
    {'price': 250, 'time': 1.6},
    {'price': 250, 'time': 1.7},
    ...
    {'price': 250, 'time': 3.6},
    {'price': 251, 'time': 3.7},
    {'price': 251, 'time': 3.8},
    {'price': 251, 'time': 3.9},
    ...
    {'price': 251, 'time': 10.0},
    {'price': 253, 'time': 10.1},
    {'price': 253, 'time': 10.2},
    {'price': 253, 'time': 10.3},
    ...
    {'price': 253, 'time': 10.9},
    {'price': 254, 'time': 11.0}]

我真的不确定如何有效地做到这一点,并希望有一个很好的pythonic方式来做到这一点。我尝试过的是迭代timeseries1,使用while循环将新值附加到timeseries2的末尾,但这似乎非常低效,有2个嵌套循环。

编辑:这是目前用于执行此操作的代码/算法。

startTime = math.floor(timeseries1[0]['time'] / timeStep) * timeStep
oldPrice = timeseries1[0]['price']
timeseries3 = []
timeseries3.append(timeseries1[0])
timeseries3[0]['time'] = startTime
for x in timeseries1[1:]:
    while startTime < x['time']:
        timeseries3.append({'price': oldPrice, 'time': startTime})
        startTime += timeStep
    oldPrice = x['price']

因此,timeseries3最终将与timeseries2相同。

2 个答案:

答案 0 :(得分:1)

尝试使用RedBlackPy。 RedBlackPy.Series类建立在红黑树上,可方便地使用时间序列,它具有内置在getitem运算符(Series [key])中的插值方法。

import redblackpy as rb

time = [1.52, 3.65, 10.1, 10.99]
price = [250, 251, 253, 254]
# create Series with 'floor' interpolation 
# your case, in time t you need last known value
series = rb.Series( index=time, values=price, dtype='float64',
                    interpolate='floor' )
# now you can access at any key with no insertion using interpolation
# and can create new series with necessary time step
# args in uniform method: (start, end, step)
new_series = series.uniform(1.5, 11, 0.1)
# required result!
print(new_series)

最后打印的输出如下(带有浮点运算的问题):

Series object Untitled
1.5: 0.0
1.6: 250.0
1.7000000000000002: 250.0
1.8000000000000003: 250.0
1.9000000000000004: 250.0
2.0000000000000004: 250.0
2.1000000000000005: 250.0
...
9.89999999999998: 251.0
9.99999999999998: 251.0
10.09999999999998: 251.0
10.19999999999998: 253.0
10.29999999999998: 253.0
10.399999999999979: 253.0
10.499999999999979: 253.0
10.599999999999978: 253.0
10.699999999999978: 253.0
10.799999999999978: 253.0
10.899999999999977: 253.0
10.999999999999977: 254.0

提醒,使用插值功能您可以访问任何键!如果您只是想以统一的时间步长迭代它,则不必创建新的序列。您可以使用RedBlackPy.Series做到这一点,而无需额外的内存:

 import redblackpy as rb

 # create iterator for time
 def grid_generator(start, stop, step):

     it = start - step

     while it <= stop:
         it += step
         yield it

  time = [1.52, 3.65, 10.1, 10.99]
  price = [250, 251, 253, 254]
  # create Series with 'floor' interpolation 
  # your case, in time t you need last known value
  series = rb.Series( index=time, values=price, dtype='float64',
                      interpolate='floor' )

  # ok, now we iterate over our Series (with 4 elements!)
  for key in grid_generator(1.6, 11, 0.1):
      print(series[key]) # prints last known value (your case)

答案 1 :(得分:0)

  

...希望有一种不错的pythonic方式。

这是生成列表的pythonic方法:使用生成器!但是,我必须承认以下代码存在问题:

def timeseries( t1, t2, p1, coeff, step ):
  t = t1
  while t <= t2:
    yield { 'price' :  int( p1 + ( t - t1 ) * coeff), 'time' : t }
    t += step


print list(timeseries( 1.5, 11 , 250 , 0.43 , 0.1 ) )

因此,生成器可能是一种创造时间序列的“有趣”方式。但是,由于我在运行它时遇到的浮动算术问题,它需要工作:

[{'price': 250, 'time': 1.5}, {'price': 250, 'time': 1.6}, {'price': 250, 'time': 1.7000000000000002}, {'price': 250, 'time': 1.8000000000000003}, {'price': 250, 'time': 1.9000000000000004}, {'price': 250, 'time': 2.0000000000000004}, {'price': 250, 'time': 2.1000000000000005}, {'price': 250, 'time': 2.2000000000000006}, {'price': 250, 't...

虽然我认为上面的代码很容易阅读(好吧,变量名可能更具描述性,或许可能是一两条评论或者说很好)这里有一个更紧凑的python代码来实现同样的功能事情。它不是声明生成器函数,而是使用匿名生成器来完成同样的事情。

为了完整性,我添加了一条线来计算出执行插值的数据的斜率。

(t1,p1,t2,p2) = ( 1.52 , 250.0 , 10.99, 254.0 ) 
coeff = ( p2 - p1) / ( t2  - t1 ) 
print  list( { 'time' : i/10.0, 'price' :  int (i/10.0*coeff * 100 ) / 100   + p1  } for i in range(int( t1 * 10 ) , int( t2 * 10 )))

代码可以进一步推广。 10.0和100值用于执行整数数学运算并仅保留我们关心的有效数字。这比以前的代码更清晰,只需将0.1的步长添加到当前时间t(t + = step),时间值就变得非常难以获得。该网站谈论使用frange generator built on decimal.Decimal。在我的2.7 python环境中,我无法正常工作,所以我只是将比例/有效数字硬编码到公式中(同样,不是很一般)。