Question

我有以下数据框df：

            bucket_value  is_new_bucket
dates                                  
2019-03-07             0              1
2019-03-08             1              0
2019-03-09             2              0
2019-03-10             3              0
2019-03-11             4              0
2019-03-12             5              1
2019-03-13             6              0
2019-03-14             7              1

我想对列bucket_value等于零的每个is_new_bucket数据组应用特定的函数（比方说均值函数），以使结果数据帧看起来像这样： / p>

            mean_values
dates             
2019-03-08     2.5
2019-03-13     6.0

换句话说，将函数应用于is_new_bucket = 0的连续行，该行将bucket_value作为输入。

例如，如果我想应用max函数，则结果数据帧将如下所示：

            max_values
dates             
2019-03-11     4.0
2019-03-13     6.0

Answer 1

将cumsum与filter一起使用

df.reset_index(inplace=True)
s=df.loc[df.is_new_bucket==0].groupby(df.is_new_bucket.cumsum()).agg({'date':'first','bucket_value':['mean','max']})
s
                    date bucket_value    
                   first         mean max
is_new_bucket                            
1             2019-03-08          2.5   4
2             2019-03-13          6.0   6

已更新

df.loc[df.loc[df.is_new_bucket==0].groupby(df.is_new_bucket.cumsum())['bucket_value'].idxmax()]
        date  bucket_value  is_new_bucket
4 2019-03-11             4              0
6 2019-03-13             6              0

在使用cumsum创建组密钥Newkey之后，Updated2可以根据组密钥进行任何需要的操作

df['Newkey']=df.is_new_bucket.cumsum()
df
        date  bucket_value  is_new_bucket  Newkey
0 2019-03-07             0              1       1
1 2019-03-08             1              0       1
2 2019-03-09             2              0       1
3 2019-03-10             3              0       1
4 2019-03-11             4              0       1
5 2019-03-12             5              1       2
6 2019-03-13             6              0       2
7 2019-03-14             7              1       3

将功能应用于特定行范围

1 个答案: