我正在尝试创建一个新列,该列将df按Deal
和Month
分组,然后将一个百分比(9%)应用于Amount
列。如果特定月份中某个特定Amount
的所有Deal
值总计为20,000,则将百分比应用于Amount
;否则,如果TYPE
为MONTHLY
,且单个Amount
至少为1500,则将该百分比应用于Amount
;否则,请乘以0。
df.groupby(['Deal', 'Month'])["Amount"].apply(
lambda x: x.sum() * 0.09 if x.sum() >= 20000 else (
x * 0.09 if x >= 1500 and x['TYPE'] == 'MONTHLY' else 0
)
)
这是我尝试过的方法,但不断出现诸如ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
或KeyError: ('TYPE', u'occurred at index 0')
之类的错误。我尝试使用transform而不是apply。非常感谢您的帮助。
如果我的分组DF看起来像+所需列,则为
Deal TYPE Month Amount Desired Column
0 Com A ANNUAL April 10021.34 0
1 Com A MONTHLY April 35.86 . 0
2 Com B MONTHLY April 11150.05 1,003.50
3 Com B ANNUAL July 661.65 0
4 Com B ANNUAL August 303.63 0
5 Com C ANNUAL April 25624.59 2,306.21
6 Com D ANNUAL June 27309.26 2,457.83
7 Com D ANNUAL July 0.00 0
8 Com D ANNUAL August 0.00 0
9 Com E ANNUAL April 10.65 0
10 Com E MONTHLY May 0.00 0
11 Com E ANNUAL May 18716.70 1,684.5
12 Com E MONTHLY June 0.00 0
13 Com E ANNUAL June 606.49 0
14 Com E MONTHLY July 0.00 0
15 Com E MONTHLY July 8890.17 800.11
16 Com E MONTHLY August 4000 0
17 Com E ANNUAL August 16000 1,800
18 Com E ANNUAL September 2157.34 0
19 Com E ANNUAL October 3025.24 0
答案 0 :(得分:1)
在这种情况下,您不需要groupby
。有两种方法可以做到这一点,从概念上讲,最简单的方法是首先根据月额还是年额来计算阈值
df['Threshold'] = (df.TYPE=='ANNUAL')*20000 + (df.TYPE=='MONTHLY')*1500
然后您可以根据是否达到阈值来计算金额
df['Desired Amount'] = (df.Amount>df.Threshold)*0.09*df.Amount
但这在这里可行,因为您没有针对同一笔交易,月份和类型的多个行。如果这样做了,那么您首先需要groupby来汇总所有这些信息
df = df.groupby(['Deal','Month','TYPE']).sum()
df.reset_index(inplace=True)
然后您可以按照上述步骤进行操作。
答案 1 :(得分:1)
我试图将您的描述翻译成这样:
df['Sum'] = df.groupby(['Deal','Month'])['Amount'].transform('sum')
df['Desired Column'] = np.where(df['Sum'] > 20000, df['Sum'] * 0.09, np.where((df['Amount'] >= 1500) & (df['TYPE'] == 'MONTHLY'), df['Amount'] * 0.09, 0))
尽管我发现生成的结果与您发布的“所需列”之间存在一些差异,例如在第16行中,它是每月一次,金额大于1500,因此结果应该是0.09 * 4000 = 360,不确定如何得到0。我想您是在手动计算过程中犯了一个错误,或者可能是我误解了您的描述,请随时解释 以便我可以更新脚本,但是我想一般的想法应该可以解决您的问题。
P.S。运行脚本后的结果df
Deal TYPE Month Amount Sum Desired Column
0 A ANNUAL April 10021.34 10057.20 0.0000
1 A MONTHLY April 35.86 10057.20 0.0000
2 B MONTHLY April 11150.05 11150.05 1003.5045
3 B ANNUAL July 661.65 661.65 0.0000
4 B ANNUAL August 303.63 303.63 0.0000
5 C ANNUAL April 25624.59 25624.59 2306.2131
6 D ANNUAL June 27309.26 27309.26 2457.8334
7 D ANNUAL July 0.00 0.00 0.0000
8 D ANNUAL August 0.00 0.00 0.0000
9 E ANNUAL April 10.65 10.65 0.0000
10 E MONTHLY May 0.00 18716.70 0.0000
11 E ANNUAL May 18716.70 18716.70 0.0000
12 E MONTHLY June 0.00 606.49 0.0000
13 E ANNUAL June 606.49 606.49 0.0000
14 E MONTHLY July 0.00 8890.17 0.0000
15 E MONTHLY July 8890.17 8890.17 800.1153
16 E MONTHLY August 4000.00 18000.00 360.0000
17 E ANNUAL August 14000.00 18000.00 0.0000
18 E ANNUAL September 2157.34 2157.34 0.0000
19 E ANNUAL October 3025.24 3025.24 0.0000