我有一个数据框:
Localmax symbol dvol idx
2016-10-19 09:05:00 st1 5172.159 2016-10-19 09:05:00
2016-10-19 09:05:00 st2 5172.18 2016-10-19 09:05:00
2016-10-19 17:30:00 st1 5000 2016-10-19 17:30:00
2016-10-19 17:40:00 st2 8000 2016-10-19 17:40:00
如何对每个符号进行重新采样,以便每天获得dvol的总和,并保持每天最后一次观察的索引?
我尝试过:
> df['idx']=df.index
> dvol_sum = df.groupby(['symbol', Grouper(freq='D')])['dvol', 'idx'].agg(['sum'])
但是它仅产生一列dvol,并且索引带有00:00:00时间戳。.
预期输出为:
Localmax symbol dvol
2016-10-19 17:30:00 st1 sum of dvol for 2016-10-19 for st1
2016-10-19 17:40:00 st2 sum of dvol for 2016-10-19 for st2
答案 0 :(得分:0)
您需要在列符号上使用jdk 1.8
和groupby
。
然后使用sum
和groupby
选择最近输入的localmax索引:
max
输出:
df = pd.DataFrame({'Localmax':['2016-10-19 09:05:00','2016-10-19 09:05:00','2016-10-19 17:30:00','2016-10-19 17:40:00'],
'symbol':['st1','st2','st1','st2'], 'dvol':[5172.159,5172.18,5000,8000]})
df['Localmax'] = pd.to_datetime(df['Localmax'])
df['date'] = df['Localmax'].dt.date
df_new = df.groupby(['symbol','date'])['dvol'].sum().reset_index()
df_new.index = df.groupby(['symbol','date'])['Localmax'].agg(max)
print(df_new)
答案 1 :(得分:0)
我认为应该有一个比这更好的简单方法,但这很好用:
In [58]: df
Out[58]:
Localmax symbol dvol idx
0 2016-10-19 09:05:00 st1 5172.159 2016-10-19 09:05:00
1 2016-10-19 09:05:00 st2 5172.180 2016-10-19 09:05:00
2 2016-10-19 17:30:00 st1 5000.000 2016-10-19 17:30:00
3 2016-10-19 17:40:00 st2 8000.000 2016-10-19 17:40:00
4 2016-10-20 17:30:00 st1 6000.000 2016-10-19 17:30:00
5 2016-10-20 17:40:00 st2 9000.000 2016-10-19 17:40:00
In [59]: df['Localmax'] = pd.to_datetime(df['Localmax'])
In [60]: df['date'] = df['Localmax'].dt.date
In [61]: new_df = df.groupby(['date','symbol'],as_index=False)['dvol'].max()
In [62]: new_df['date'] = new_df.date.map(df.groupby(['date'])['Localmax'].max())
In [63]: new_df
Out[63]:
date symbol dvol
0 2016-10-19 17:40:00 st1 5172.159
1 2016-10-19 17:40:00 st2 8000.000
2 2016-10-20 17:40:00 st1 6000.000
3 2016-10-20 17:40:00 st2 9000.000