转换不规则时间戳的Dateframe对象,将价格和数量信息转换为等间隔的,体积加权平均价格

时间:2013-06-05 20:37:51

标签: python dataframe pandas time-series

我有关于股票价格和交易量的数据,这些数据有时间戳和不规则间隔,并且有重复的时间指数。这种数据的一个简单例子是:

                       unixtime    price  amount
2011-04-17 01:03:11  1303002191  1.02570       1
2011-04-17 01:03:14  1303002194  1.02570       1
2011-04-17 01:03:17  1303002197  1.02570       1
2011-04-17 01:03:19  1303002199  1.02570       1
2011-04-17 01:03:21  1303002201  1.02570       1
2011-04-17 01:03:23  1303002203  1.02570       1
2011-04-17 01:03:37  1303002217  1.02570       1
2011-04-17 01:03:45  1303002225  1.02570       1
2011-04-17 01:03:57  1303002237  1.02570       1
2011-04-17 01:04:42  1303002282  1.02570       1
2011-04-17 01:04:55  1303002295  1.02570       1
2011-04-17 01:05:00  1303002300  1.02570       1
2011-04-17 01:05:03  1303002303  1.02570       1
2011-04-17 01:05:11  1303002311  1.02570       1
2011-04-17 01:05:24  1303002324  1.02570       1
2011-04-17 01:05:34  1303002334  1.02570       1
2011-04-17 01:05:45  1303002345  1.02570       1
2011-04-17 01:05:56  1303002356  1.02570       1
2011-04-17 01:06:11  1303002371  1.02570       1
2011-04-17 01:06:25  1303002385  1.02570       1
2011-04-17 01:06:28  1303002388  1.02570       1
2011-04-17 01:06:31  1303002391  1.02570       1
2011-04-17 01:06:33  1303002393  1.02570       1
2011-04-17 01:06:34  1303002394  1.02560       1
2011-04-17 01:06:44  1303002404  1.02560       1
2011-04-17 01:07:02  1303002422  1.02560       2
2011-04-17 01:07:21  1303002441  1.02563       2
2011-04-17 01:07:46  1303002466  1.02563       2
2011-04-17 01:08:24  1303002504  1.02563       2
2011-04-17 01:09:55  1303002595  1.02570       2
2011-04-17 01:10:50  1303002650  1.02570       2
2011-04-17 01:11:02  1303002662  1.02570       2

我想要的是在这种情况下的等间距系列,可以说30秒频率和体积(金额) - 加权平均价格。我已经能够分别使用如何 =“last”和“sum”获得等间隔的30秒间隔和该特定间隔的最后价格以及该间隔期间的总量(体积)。但是,如何重新采样以获得30秒的音量加权价格呢?

1 个答案:

答案 0 :(得分:4)

我想我会为总销售量创建一个新列,并做两个重新采样:

In [11]: df['total'] = df['price'] * df['amount']

In [12]: df.total.resample('30S', how='sum') / df.amount.resample('30S', how='sum')
Out[12]:
2011-04-17 01:03:00    1.025700
2011-04-17 01:03:30    1.025700
2011-04-17 01:04:00         NaN
2011-04-17 01:04:30    1.025700
2011-04-17 01:05:00    1.025700
2011-04-17 01:05:30    1.025700
2011-04-17 01:06:00    1.025700
2011-04-17 01:06:30    1.025650
2011-04-17 01:07:00    1.025615
2011-04-17 01:07:30    1.025630
2011-04-17 01:08:00    1.025630
2011-04-17 01:08:30         NaN
2011-04-17 01:09:00         NaN
2011-04-17 01:09:30    1.025700
2011-04-17 01:10:00         NaN
2011-04-17 01:10:30    1.025700
2011-04-17 01:11:00    1.025700
Freq: 30S, dtype: float64

假设这是你想要的东西......