熊猫groupby并计算百分比变化

时间:2019-01-23 07:52:31

标签: python pandas

我参考了How to create rolling percentage for groupby DataFrame

import pandas as pd

data = [
    ('product_a','1/31/2014',53)
    ,('product_b','1/31/2014',44)
    ,('product_c','1/31/2014',36)
    ,('product_a','11/30/2013',52)
    ,('product_b','11/30/2013',43)
    ,('product_c','11/30/2013',35)
    ,('product_a','3/31/2014',50)
    ,('product_b','3/31/2014',41)
    ,('product_c','3/31/2014',34)
    ,('product_a','12/31/2013',50)
    ,('product_b','12/31/2013',41)
    ,('product_c','12/31/2013',34)
    ,('product_a','2/28/2014',52)
    ,('product_b','2/28/2014',43)
    ,('product_c','2/28/2014',35)]

product_df = pd.DataFrame( data, columns=['prod_desc','activity_month','prod_count'] )
product_df.sort_values('activity_month', inplace = True, ascending=False) 
product_df['pct_ch'] = product_df.groupby('prod_desc')['prod_count'].pct_change() + 1

print(product_df)

但是,我无法像建议的答案那样产生输出。

产生的答案

    prod_desc activity_month  prod_count    pct_ch
0   product_a      1/31/2014          53       NaN
1   product_b      1/31/2014          44  0.830189
2   product_c      1/31/2014          36  0.818182
3   product_a     11/30/2013          52  1.444444
4   product_b     11/30/2013          43  0.826923
5   product_c     11/30/2013          35  0.813953
9   product_a     12/31/2013          50  1.428571
10  product_b     12/31/2013          41  0.820000
11  product_c     12/31/2013          34  0.829268
12  product_a      2/28/2014          52  1.529412
13  product_b      2/28/2014          43  0.826923
14  product_c      2/28/2014          35  0.813953
6   product_a      3/31/2014          50  1.428571
7   product_b      3/31/2014          41  0.820000
8   product_c      3/31/2014          34  0.829268

预期答案应类似于以下内容,应为每个prod_desc(product_a,product_b和product_c)计算百分比变化,而不是仅计算一列

 product_desc activity_month  prod_count    pct_ch
0    product_a     2014-01-01          53       NaN
3    product_a     2014-02-01          26  0.490566
6    product_a     2014-03-01          41  1.576923
1    product_b     2014-01-01          42       NaN
4    product_b     2014-02-01          48  1.142857
7    product_b     2014-03-01          35  0.729167
2    product_c     2014-01-01          38       NaN
5    product_c     2014-02-01          39  1.026316
8    product_c     2014-03-01          50  1.282051

提前谢谢

2 个答案:

答案 0 :(得分:2)

GroupBy.applySeries.pct_change一起使用:

product_df['activity_month'] = pd.to_datetime(product_df['activity_month'])
product_df.sort_values(['prod_desc','activity_month'], inplace = True, ascending=[True, False])

product_df['pct_ch'] = (product_df.groupby('prod_desc')['prod_count']
                                  .apply(pd.Series.pct_change) + 1)
print(product_df)
    prod_desc activity_month  prod_count    pct_ch
6   product_a     2014-03-31          50       NaN
12  product_a     2014-02-28          52  1.040000
0   product_a     2014-01-31          53  1.019231
9   product_a     2013-12-31          50  0.943396
3   product_a     2013-11-30          52  1.040000
7   product_b     2014-03-31          41       NaN
13  product_b     2014-02-28          43  1.048780
1   product_b     2014-01-31          44  1.023256
10  product_b     2013-12-31          41  0.931818
4   product_b     2013-11-30          43  1.048780
8   product_c     2014-03-31          34       NaN
14  product_c     2014-02-28          35  1.029412
2   product_c     2014-01-31          36  1.028571
11  product_c     2013-12-31          34  0.944444
5   product_c     2013-11-30          35  1.029412

答案 1 :(得分:0)

万一期间,您可以使用以下代码:

product_df['pct_ch'] = (product_df.groupby('prod_desc')['prod_count']
                                  .apply(lambda dfi : dfi.pct_change(periods=126)) + 1)