Question

我有一个如下所示的数据框：

         prod_code      month  items      cost
0  040201060AAAIAI 2016-05-01      5    572.20   
1  040201060AAAKAK 2016-05-01    164  14805.19    
2  040201060AAALAL 2016-05-01  13465  14486.07

我想首先按prod_code的前四个字符分组，然后将2016年1月至2月的每个组的总费用相加，然后将其与2016年3月至4月的总费用进行比较，然后找到两个时期内百分比增幅最大的群体。

最好的方法是什么？

到目前为止，这是我的代码：

d = { 'prod_code': ['040201060AAAIAI', '040201060AAAIAJ', '040201060AAAIAI', '040201060AAAIAI', '040201060AAAIAI', '040201060AAAIAI', '040301060AAAKAG', '040301060AAAKAK', '040301060AAAKAK', '040301060AAAKAX', '040301060AAAKAK', '040301060AAAKAK'], 'month': ['2016-01-01', '2016-02-01', '2016-03-01', '2016-01-01', '2016-02-01', '2016-03-01', '2016-01-01', '2016-02-01', '2016-03-01', '2016-01-01', '2016-02-01', '2016-03-01'], 'cost': [43, 45, 46, 41, 48, 59, 8, 9, 10, 12, 15, 13] }
df = pd.DataFrame.from_dict(d)
df['para'] = df.prod_code.str[:4]
df_para = df.groupby(['para', 'month']).sum()

这给我df_para看起来像这样：

                 cost
para month
0402 2016-01-01    84
     2016-02-01    93
     2016-03-01   105
0403 2016-01-01    20
     2016-02-01    24
     2016-03-01    23

现在我需要计算1月至2月和4月至3月每组的总和，然后计算这两组之间的差异，最后按这两组之间的差异进行排序。这样做的最佳方式是什么？

Answer 1

您可以根据月份是Jan-Feb还是Mar-Apr来创建月份组变量，然后按代码和月份组变量进行分组，汇总成本并计算差异：

import numpy as np
import pandas as pd
df['month_period'] = np.where(pd.to_datetime(df.month).dt.month.isin([1,2]), 1, 2)
# creation of the month group variable could be adjusted based on how you want to cut 
# your time, this is a simplified example which assumes you only have data from Jan-Apr

(df.groupby([df.prod_code.str[:4], df.month_period]).sum().groupby(level = 0).pct_change()
   .dropna().sort('cost', ascending=False))

熊猫：比较两个时间段的总和？

1 个答案: