我有关于股票价格和交易量的数据,这些数据有时间戳和不规则间隔,并且有重复的时间指数。这种数据的一个简单例子是:
unixtime price amount
2011-04-17 01:03:11 1303002191 1.02570 1
2011-04-17 01:03:14 1303002194 1.02570 1
2011-04-17 01:03:17 1303002197 1.02570 1
2011-04-17 01:03:19 1303002199 1.02570 1
2011-04-17 01:03:21 1303002201 1.02570 1
2011-04-17 01:03:23 1303002203 1.02570 1
2011-04-17 01:03:37 1303002217 1.02570 1
2011-04-17 01:03:45 1303002225 1.02570 1
2011-04-17 01:03:57 1303002237 1.02570 1
2011-04-17 01:04:42 1303002282 1.02570 1
2011-04-17 01:04:55 1303002295 1.02570 1
2011-04-17 01:05:00 1303002300 1.02570 1
2011-04-17 01:05:03 1303002303 1.02570 1
2011-04-17 01:05:11 1303002311 1.02570 1
2011-04-17 01:05:24 1303002324 1.02570 1
2011-04-17 01:05:34 1303002334 1.02570 1
2011-04-17 01:05:45 1303002345 1.02570 1
2011-04-17 01:05:56 1303002356 1.02570 1
2011-04-17 01:06:11 1303002371 1.02570 1
2011-04-17 01:06:25 1303002385 1.02570 1
2011-04-17 01:06:28 1303002388 1.02570 1
2011-04-17 01:06:31 1303002391 1.02570 1
2011-04-17 01:06:33 1303002393 1.02570 1
2011-04-17 01:06:34 1303002394 1.02560 1
2011-04-17 01:06:44 1303002404 1.02560 1
2011-04-17 01:07:02 1303002422 1.02560 2
2011-04-17 01:07:21 1303002441 1.02563 2
2011-04-17 01:07:46 1303002466 1.02563 2
2011-04-17 01:08:24 1303002504 1.02563 2
2011-04-17 01:09:55 1303002595 1.02570 2
2011-04-17 01:10:50 1303002650 1.02570 2
2011-04-17 01:11:02 1303002662 1.02570 2
我想要的是在这种情况下的等间距系列,可以说30秒频率和体积(金额) - 加权平均价格。我已经能够分别使用如何 =“last”和“sum”获得等间隔的30秒间隔和该特定间隔的最后价格以及该间隔期间的总量(体积)。但是,如何重新采样以获得30秒的音量加权价格呢?
答案 0 :(得分:4)
我想我会为总销售量创建一个新列,并做两个重新采样:
In [11]: df['total'] = df['price'] * df['amount']
In [12]: df.total.resample('30S', how='sum') / df.amount.resample('30S', how='sum')
Out[12]:
2011-04-17 01:03:00 1.025700
2011-04-17 01:03:30 1.025700
2011-04-17 01:04:00 NaN
2011-04-17 01:04:30 1.025700
2011-04-17 01:05:00 1.025700
2011-04-17 01:05:30 1.025700
2011-04-17 01:06:00 1.025700
2011-04-17 01:06:30 1.025650
2011-04-17 01:07:00 1.025615
2011-04-17 01:07:30 1.025630
2011-04-17 01:08:00 1.025630
2011-04-17 01:08:30 NaN
2011-04-17 01:09:00 NaN
2011-04-17 01:09:30 1.025700
2011-04-17 01:10:00 NaN
2011-04-17 01:10:30 1.025700
2011-04-17 01:11:00 1.025700
Freq: 30S, dtype: float64
假设这是你想要的东西......