重新采样总和以保持每天熊猫的最后一次观察指数

时间:2018-08-09 10:52:43

标签: python pandas resampling

我有一个数据框:

Localmax              symbol  dvol        idx
2016-10-19 09:05:00   st1     5172.159  2016-10-19 09:05:00
2016-10-19 09:05:00   st2     5172.18   2016-10-19 09:05:00 
2016-10-19 17:30:00   st1     5000      2016-10-19 17:30:00
2016-10-19 17:40:00   st2     8000      2016-10-19 17:40:00

如何对每个符号进行重新采样,以便每天获得dvol的总和,并保持每天最后一次观察的索引?

我尝试过:

> df['idx']=df.index 
> dvol_sum = df.groupby(['symbol', Grouper(freq='D')])['dvol', 'idx'].agg(['sum'])

但是它仅产生一列dvol,并且索引带有00:00:00时间戳。.

预期输出为:

    Localmax         symbol         dvol              
2016-10-19 17:30:00   st1     sum of dvol for 2016-10-19 for st1 
2016-10-19 17:40:00   st2     sum of dvol for 2016-10-19 for st2

2 个答案:

答案 0 :(得分:0)

您需要在列符号上使用jdk 1.8groupby。 然后使用sumgroupby选择最近输入的localmax索引:

max

输出:

df = pd.DataFrame({'Localmax':['2016-10-19 09:05:00','2016-10-19 09:05:00','2016-10-19 17:30:00','2016-10-19 17:40:00'],
               'symbol':['st1','st2','st1','st2'], 'dvol':[5172.159,5172.18,5000,8000]})


df['Localmax'] = pd.to_datetime(df['Localmax'])
df['date'] = df['Localmax'].dt.date

df_new = df.groupby(['symbol','date'])['dvol'].sum().reset_index()

df_new.index = df.groupby(['symbol','date'])['Localmax'].agg(max)

print(df_new)

答案 1 :(得分:0)

我认为应该有一个比这更好的简单方法,但这很好用:

In [58]: df
Out[58]: 
              Localmax symbol      dvol                  idx
0  2016-10-19 09:05:00    st1  5172.159  2016-10-19 09:05:00
1  2016-10-19 09:05:00    st2  5172.180  2016-10-19 09:05:00
2  2016-10-19 17:30:00    st1  5000.000  2016-10-19 17:30:00
3  2016-10-19 17:40:00    st2  8000.000  2016-10-19 17:40:00
4  2016-10-20 17:30:00    st1  6000.000  2016-10-19 17:30:00
5  2016-10-20 17:40:00    st2  9000.000  2016-10-19 17:40:00

In [59]: df['Localmax'] = pd.to_datetime(df['Localmax'])

In [60]: df['date'] = df['Localmax'].dt.date

In [61]: new_df = df.groupby(['date','symbol'],as_index=False)['dvol'].max()

In [62]: new_df['date'] = new_df.date.map(df.groupby(['date'])['Localmax'].max())

In [63]: new_df
Out[63]: 
                 date symbol      dvol
0 2016-10-19 17:40:00    st1  5172.159
1 2016-10-19 17:40:00    st2  8000.000
2 2016-10-20 17:40:00    st1  6000.000
3 2016-10-20 17:40:00    st2  9000.000