我正在寻找一种使用自定义函数重新索引数据的方法。我的数据如下:
AAA BBB CCC DDD
Time
2009-01-30 09:30:00 6407.04 43.90 44.01 85.11
2009-01-30 09:39:00 6403.20 43.82 44.01 84.93
2009-01-30 09:40:00 6400.00 43.90 44.03 84.90
2009-01-30 09:45:00 6396.16 43.97 44.04 84.91
2009-01-30 09:48:00 6393.60 44.02 44.07 84.81
2009-01-30 09:55:00 6400.00 44.31 44.14 84.78
2009-01-30 09:56:00 6406.40 44.36 44.16 84.57
2009-01-30 09:59:00 6426.24 44.36 44.11 84.25
2009-01-30 10:00:00 6438.40 44.32 44.09 84.32
2009-01-30 10:06:00 6495.36 44.43 44.16 84.23
它是一些股票价格的分钟数据。我想将交易日分成5个部分并重新采样我的数据。 我从创建自定义索引开始:
index_date = pd.date_range('2009-01-30', '2016-03-01')
index_date = pd.Series(index_date)
index_time = pd.date_range('09:30:00', '16:00:00', freq='78min')
index_time = pd.Series(index_time.time)
index = index_date.apply(
lambda d: index_time.apply(
lambda t: datetime.combine(d, t)
)
).unstack().sort_values().reset_index(drop=True)
让我们假设我想应用基本的百分比变化函数:
def percent_change(x):
if len(x):
return (x[-1]-x[0])/x[0]
所需的数据集sholud如下所示:
AAA BBB CCC DDD
2009-01-30 09:30:00 NaN NaN NaN NaN
2009-01-30 10:48:00 y y y y # where y is the output of the
2009-01-30 12:06:00 x x x x percent_change function from
2009-01-30 13:24:00 9:30 to 14:48
2009-01-30 14:42:00 # x is the output of the
2009-01-30 16:00:00 percent_change function
2009-01-31 09:30:00 from 10:49 to 12:06, etc
2009-01-31 10:48:00
我可以在此处找到更大的数据示例:
https://www.dropbox.com/s/h29xlpveb1o7p2u/data.csv?dl=0
我怎么能这样做?
答案 0 :(得分:3)
<强>更新强>
In [182]: %paste
(df.groupby(df.index.date)
.apply(lambda x: x.resample('78T',
loffset=pd.Timedelta('24minute')).mean())
.ffill()
.pct_change()
)
## -- End pasted text --
Out[182]:
vxxc
Time
2009-02-02 2009-02-02 09:30:00 NaN
2009-02-02 10:48:00 -0.010745
2009-02-02 12:06:00 -0.006372
2009-02-02 13:24:00 -0.003701
2009-02-02 14:42:00 0.001614
2009-02-02 16:00:00 -0.005668
2009-02-03 2009-02-03 09:30:00 -0.009334
2009-02-03 10:48:00 -0.007039
2009-02-03 12:06:00 -0.002014
2009-02-03 13:24:00 -0.002705
2009-02-03 14:42:00 -0.017530
2009-02-03 16:00:00 -0.004704
2009-02-03 17:18:00 -0.001893
2009-02-04 2009-02-04 09:30:00 -0.019076
2009-02-04 10:48:00 -0.002563
2009-02-04 12:06:00 0.002348
2009-02-04 13:24:00 0.010099
2009-02-04 14:42:00 0.013081
2009-02-04 16:00:00 -0.000264
2009-02-04 17:18:00 0.007121
2009-02-05 2009-02-05 09:30:00 0.026527
2009-02-05 10:48:00 -0.013580
2009-02-05 12:06:00 -0.018056
2009-02-05 13:24:00 -0.005020
2009-02-05 14:42:00 -0.006316
2009-02-05 16:00:00 0.003269
2009-02-06 2009-02-06 09:30:00 -0.030773
2009-02-06 10:48:00 0.001088
2009-02-06 12:06:00 0.010469
2009-02-06 13:24:00 -0.008337
... ...
2009-02-23 2009-02-23 09:30:00 0.002312
2009-02-23 10:48:00 0.012162
2009-02-23 12:06:00 0.009785
2009-02-23 13:24:00 0.008687
2009-02-23 14:42:00 0.000421
2009-02-23 16:00:00 0.012550
2009-02-24 2009-02-24 09:30:00 -0.009290
2009-02-24 10:48:00 -0.017526
2009-02-24 12:06:00 -0.004194
2009-02-24 13:24:00 -0.021528
2009-02-24 14:42:00 -0.027898
2009-02-24 16:00:00 -0.012646
2009-02-25 2009-02-25 09:30:00 0.021827
2009-02-25 10:48:00 0.001863
2009-02-25 12:06:00 -0.012693
2009-02-25 13:24:00 -0.006884
2009-02-25 14:42:00 -0.013019
2009-02-25 16:00:00 -0.008020
2009-02-26 2009-02-26 09:30:00 -0.015104
2009-02-26 10:48:00 -0.011319
2009-02-26 12:06:00 0.019160
2009-02-26 13:24:00 0.016271
2009-02-26 14:42:00 0.003807
2009-02-26 16:00:00 0.007333
2009-02-27 2009-02-27 09:30:00 0.023949
2009-02-27 10:48:00 -0.027659
2009-02-27 12:06:00 -0.006932
2009-02-27 13:24:00 -0.003167
2009-02-27 14:42:00 0.005263
2009-02-27 16:00:00 0.010594
[118 rows x 1 columns]
OLD回答:
你可以这样做:
In [104]: df.resample('18T').pct_change()
C:\envs\py35\Scripts\ipython:1: FutureWarning: .resample() is now a deferred operation
use .resample(...).mean() instead of .resample(...)
Out[104]:
AAA BBB CCC DDD
Time
2009-01-30 09:18:00 NaN NaN NaN NaN
2009-01-30 09:36:00 -0.001373 0.000626 0.000625 -0.002614
2009-01-30 09:54:00 0.005477 0.009755 0.002146 -0.005389
或者如果我们想摆脱FutureWarning
:
In [109]: df.resample('18T').mean().pct_change()
Out[109]:
AAA BBB CCC DDD
Time
2009-01-30 09:18:00 NaN NaN NaN NaN
2009-01-30 09:36:00 -0.001373 0.000626 0.000625 -0.002614
2009-01-30 09:54:00 0.005477 0.009755 0.002146 -0.005389
注意:我使用了18分钟而非78T
,因为您的示例数据的数据少于78分钟,因此将18T
更改为{ {1}}用于您的真实数据集