Pandas Groupby Lambda函数具有多个条件/列

时间:2018-10-30 23:31:45

标签: python pandas lambda group-by

我正在尝试创建一个新列,该列将df按DealMonth分组,然后将一个百分比(9%)应用于Amount列。如果特定月份中某个特定Amount的所有Deal值总计为20,000,则将百分比应用于Amount;否则,如果TYPEMONTHLY,且单个Amount至少为1500,则将该百分比应用于Amount;否则,请乘以0。

df.groupby(['Deal', 'Month'])["Amount"].apply(
    lambda x: x.sum() * 0.09 if x.sum() >= 20000 else (
        x * 0.09 if x >= 1500 and x['TYPE'] == 'MONTHLY' else 0
    )
)

这是我尝试过的方法,但不断出现诸如ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().KeyError: ('TYPE', u'occurred at index 0')之类的错误。我尝试使用transform而不是apply。非常感谢您的帮助。

如果我的分组DF看起来像+所需列,则为

   Deal         TYPE    Month        Amount   Desired Column
0   Com A   ANNUAL  April   10021.34   0
1   Com A   MONTHLY April   35.86 .    0
2   Com B   MONTHLY April   11150.05   1,003.50
3   Com B   ANNUAL  July    661.65     0
4   Com B   ANNUAL  August  303.63     0
5   Com C   ANNUAL  April   25624.59   2,306.21
6   Com D   ANNUAL  June    27309.26   2,457.83  
7   Com D   ANNUAL  July    0.00       0
8   Com D   ANNUAL  August  0.00       0
9   Com E   ANNUAL  April   10.65      0
10  Com E   MONTHLY May     0.00       0
11  Com E   ANNUAL  May     18716.70   1,684.5
12  Com E   MONTHLY June    0.00       0
13  Com E   ANNUAL  June    606.49     0
14  Com E   MONTHLY July    0.00       0
15  Com E   MONTHLY July    8890.17    800.11
16  Com E   MONTHLY August  4000       0
17  Com E   ANNUAL  August  16000      1,800
18  Com E   ANNUAL  September 2157.34  0
19  Com E   ANNUAL  October 3025.24    0

df

2 个答案:

答案 0 :(得分:1)

在这种情况下,您不需要groupby。有两种方法可以做到这一点,从概念上讲,最简单的方法是首先根据月额还是年额来计算阈值

df['Threshold'] = (df.TYPE=='ANNUAL')*20000 + (df.TYPE=='MONTHLY')*1500

然后您可以根据是否达到阈值来计算金额

df['Desired Amount'] = (df.Amount>df.Threshold)*0.09*df.Amount

但这在这里可行,因为您没有针对同一笔交易,月份和类型的多个行。如果这样做了,那么您首先需要groupby来汇总所有这些信息

df = df.groupby(['Deal','Month','TYPE']).sum()
df.reset_index(inplace=True)

然后您可以按照上述步骤进行操作。

答案 1 :(得分:1)

我试图将您的描述翻译成这样:

df['Sum'] = df.groupby(['Deal','Month'])['Amount'].transform('sum')

df['Desired Column'] = np.where(df['Sum'] > 20000, df['Sum'] * 0.09, np.where((df['Amount'] >= 1500) & (df['TYPE'] == 'MONTHLY'), df['Amount'] * 0.09, 0))

尽管我发现生成的结果与您发布的“所需列”之间存在一些差异,例如在第16行中,它是每月一次,金额大于1500,因此结果应该是0.09 * 4000 = 360,不确定如何得到0。我想您是在手动计算过程中犯了一个错误,或者可能是我误解了您的描述,请随时解释 以便我可以更新脚本,但是我想一般的想法应该可以解决您的问题。

P.S。运行脚本后的结果df

   Deal     TYPE      Month    Amount       Sum  Desired Column
0     A   ANNUAL      April  10021.34  10057.20          0.0000
1     A  MONTHLY      April     35.86  10057.20          0.0000
2     B  MONTHLY      April  11150.05  11150.05       1003.5045
3     B   ANNUAL       July    661.65    661.65          0.0000
4     B   ANNUAL     August    303.63    303.63          0.0000
5     C   ANNUAL      April  25624.59  25624.59       2306.2131
6     D   ANNUAL       June  27309.26  27309.26       2457.8334
7     D   ANNUAL       July      0.00      0.00          0.0000
8     D   ANNUAL     August      0.00      0.00          0.0000
9     E   ANNUAL      April     10.65     10.65          0.0000
10    E  MONTHLY        May      0.00  18716.70          0.0000
11    E   ANNUAL        May  18716.70  18716.70          0.0000
12    E  MONTHLY       June      0.00    606.49          0.0000
13    E   ANNUAL       June    606.49    606.49          0.0000
14    E  MONTHLY       July      0.00   8890.17          0.0000
15    E  MONTHLY       July   8890.17   8890.17        800.1153
16    E  MONTHLY     August   4000.00  18000.00        360.0000
17    E   ANNUAL     August  14000.00  18000.00          0.0000
18    E   ANNUAL  September   2157.34   2157.34          0.0000
19    E   ANNUAL    October   3025.24   3025.24          0.0000