我每天都有15分钟记录着巨大的收盘价和收盘价数据框。一天开始于9:45,结束于16:15。我当前的df看起来像这样:
open_p close_p
date
2013-12-20 09:45:00 -1.14 -1.12
2013-12-20 10:00:00 -1.12 -1.12
2013-12-20 10:15:00 -1.12 -1.11
2013-12-20 10:30:00 -1.11 -1.10
...
2013-12-20 15:30:00 -1.13 -1.14
2013-12-20 15:45:00 -1.14 -1.14
2013-12-20 16:00:00 -1.13 -1.06
2013-12-20 16:15:00 -1.05 -1.01
2013-12-23 09:45:00 -1.02 -1.02
2013-12-23 10:00:00 -1.02 -1.02
2013-12-23 10:15:00 -1.03 -1.07
2013-12-23 10:30:00 -1.06 -1.08
....
2013-12-23 15:30:00 -1.11 -1.14
2013-12-23 15:45:00 -1.13 -1.12
2013-12-23 16:00:00 -1.12 -1.09
2013-12-23 16:15:00 -1.09 -1.13
...
我想计算每天16:15的close_p和9:45的open_p之间的差异。例如,2013-12-20的每日更改列等于-1.01-(-1.14)。结果应如下所示:
open_p close_p daily_change
date
2013-12-20 09:45:00 -1.14 -1.12 0.13
2013-12-20 10:00:00 -1.12 -1.12 0.13
2013-12-20 10:15:00 -1.12 -1.11 0.13
2013-12-20 10:30:00 -1.11 -1.10 0.13
...
2013-12-20 15:30:00 -1.13 -1.14 0.13
2013-12-20 15:45:00 -1.14 -1.14 0.13
2013-12-20 16:00:00 -1.13 -1.06 0.13
2013-12-20 16:15:00 -1.05 -1.01 0.13
2013-12-23 09:45:00 -1.02 -1.02 -0,11
2013-12-23 10:00:00 -1.02 -1.02 -0,11
2013-12-23 10:15:00 -1.03 -1.07 -0,11
2013-12-23 10:30:00 -1.06 -1.08 -0,11
....
2013-12-23 15:30:00 -1.11 -1.14 -0,11
2013-12-23 15:45:00 -1.13 -1.12 -0,11
2013-12-23 16:00:00 -1.12 -1.09 -0,11
2013-12-23 16:15:00 -1.09 -1.13 -0,11
完成此任务最快,最方便的方法是什么?
答案 0 :(得分:1)
您可以在日期groupby
上agg
,在日期的最后print (df.groupby(pd.Grouper(freq="D"))
.agg({"open_p":"first", "close_p":"last"})
.diff(axis=1)["close_p"])
date
2013-12-20 0.13
2013-12-21 NaN
2013-12-22 NaN
2013-12-23 -0.11
Freq: D, Name: close_p, dtype: float64
,然后找出区别:
visibility
答案 1 :(得分:1)
将GroupBy.transform
与GroupBy.last
和GroupBy.first
值一起使用并减去到新列:
g = df.groupby(pd.Grouper(freq='d'))
df['daily_change'] = g['close_p'].transform('last').sub(g['open_p'].transform('first'))
print (df)
open_p close_p daily_change
date
2013-12-20 09:45:00 -1.14 -1.12 0.13
2013-12-20 10:00:00 -1.12 -1.12 0.13
2013-12-20 10:15:00 -1.12 -1.11 0.13
2013-12-20 10:30:00 -1.11 -1.10 0.13
2013-12-20 15:30:00 -1.13 -1.14 0.13
2013-12-20 15:45:00 -1.14 -1.14 0.13
2013-12-20 16:00:00 -1.13 -1.06 0.13
2013-12-20 16:15:00 -1.05 -1.01 0.13
2013-12-23 09:45:00 -1.02 -1.02 -0.11
2013-12-23 10:00:00 -1.02 -1.02 -0.11
2013-12-23 10:15:00 -1.03 -1.07 -0.11
2013-12-23 10:30:00 -1.06 -1.08 -0.11
2013-12-23 15:30:00 -1.11 -1.14 -0.11
2013-12-23 15:45:00 -1.13 -1.12 -0.11
2013-12-23 16:00:00 -1.12 -1.09 -0.11
2013-12-23 16:15:00 -1.09 -1.13 -0.11
另一个想法是使用Series.at_time
,删除将DatetimeIndex转换为dates
并最后Series.map
的时间:
f = lambda x: x.date()
s = (df['close_p'].at_time('16:15:00').rename(f)
.sub(df.at_time('09:45:00').rename(f)['open_p']))
df['daily_change'] = df.index.to_frame()['date'].dt.date.map(s)
print (df)
open_p close_p daily_change
date
2013-12-20 09:45:00 -1.14 -1.12 0.13
2013-12-20 10:00:00 -1.12 -1.12 0.13
2013-12-20 10:15:00 -1.12 -1.11 0.13
2013-12-20 10:30:00 -1.11 -1.10 0.13
2013-12-20 15:30:00 -1.13 -1.14 0.13
2013-12-20 15:45:00 -1.14 -1.14 0.13
2013-12-20 16:00:00 -1.13 -1.06 0.13
2013-12-20 16:15:00 -1.05 -1.01 0.13
2013-12-23 09:45:00 -1.02 -1.02 -0.11
2013-12-23 10:00:00 -1.02 -1.02 -0.11
2013-12-23 10:15:00 -1.03 -1.07 -0.11
2013-12-23 10:30:00 -1.06 -1.08 -0.11
2013-12-23 15:30:00 -1.11 -1.14 -0.11
2013-12-23 15:45:00 -1.13 -1.12 -0.11
2013-12-23 16:00:00 -1.12 -1.09 -0.11
2013-12-23 16:15:00 -1.09 -1.13 -0.11