Question

我有一组不同类型的帐户，有不同的选项，我试图计算2016年每个用户每个月的节省量与2014年和2015年的平均金额相比。我的DataFrame看起来像这样： / p>

key amount  id  month   opt type    year
0   100     5   1       M   E       2014
1   200     5   1       M   G       2014
2   300     5   1       R   E       2014
3   400     5   1       R   G       2014
4   105     5   1       M   E       2015
5   205     5   1       M   G       2015
6   305     5   1       R   G       2015
7   405     5   1       R   E       2015
8   90      5   1       M   E       2016
9   180     5   1       M   G       2016
10  310     5   1       R   G       2016
11  350     5   1       R   E       2016

基于以上所述，我预计用户'5'在2016年第1个月为'''E'选项保存12.5，选项'M'与2015年平均'amt'为102.5相比2016年。

我期望2016年第1个月的各种类型的完整答案如下：

M|E -12.5
M|G -22.5
R|E  -2.5
R|G -42.5

我认为groupby（）函数可能适用于此，但我开发的公式并没有给我正确的答案。

df_savings = df.groupby(['id','year','month','type','opt'], group_keys=False).apply(
         lambda s: float(s['amount'][s.year < 2016].sum()/float(2)) - float(s['amount'][s.year == 2016].sum()))

非常感谢任何帮助。以下是用于上述示例df的代码：

df = pd.DataFrame({'id':[5,5,5,5,5,5,5,5,5,5,5,5],
               'type':['E','G','E','G','E','G','G','E','E','G','G','E'],
               'opt':['M','M','R','R','M','M','R','R','M','M','R','R'],
            'year':[2014,2014,2014,2014,2015,2015,2015,2015,2016,2016,2016,2016],
            'month':[1,1,1,1,1,1,1,1,1,1,1,1],
            'amount':[100,200,300,400,105,205,305,405,90,180,310,350]
            })

Answer 1

您可以将它分成两部分，2016年和2014-15，然后分组，这会产生两个相似的数据框，您可以减去：

df[df.year == 2016].groupby(['id', 'month', 'opt', 'type'])['amount'].sum() - df[df.year < 2016].groupby(['id', 'month', 'opt', 'type'])['amount'].mean()

Python DataFrame根据columnsN

1 个答案: