我正在尝试执行groupby聚合。我的虚拟数据框如下所示:
print (df)
ID Industry Value 1 Value 2
0 1 Finance 0.25 99
1 1 Finance 0.50 73
2 1 Finance 0.25 53
3 1 Teaching 0.75 80
4 1 Teaching 0.25 78
5 1 Teaching 0.50 99
6 2 Finance 0.50 75
7 2 Finance 0.25 56
8 2 Finance 0.25 80
9 2 Teaching 0.50 79
10 3 Finance 0.25 61
11 3 Finance 0.75 87
12 3 Finance 0.75 97
13 3 Finance 0.25 99
14 3 Finance 0.25 76
15 3 Teaching 0.25 73
16 3 Teaching 0.75 68
17 3 Teaching 0.25 59
18 3 Teaching 0.25 60
我想按ID和行业分组,我想创建一个新字段,例如“ Expected”。预期将等于:
如果可能的话,我想避免循环。任何帮助将不胜感激,因为使用iloc,groupby agg,groupby转换的多次尝试对我来说都是很短的。
答案 0 :(得分:4)
首先通过numpy.where
和duplicated
设置新列,然后使用DataFrameGroupBy.cumsum
:
m = df.duplicated(['ID','Industry'])
df['new'] = np.where(m, -df['Value 1'] * df['Value 2'], df['Value 1'] + df['Value 2'])
df['new'] = df.groupby(['ID','Industry'])['new'].cumsum()
print (df)
ID Industry Value 1 Value 2 new
0 1 Finance 0.25 99 99.25
1 1 Finance 0.50 73 62.75
2 1 Finance 0.25 53 49.50
3 1 Teaching 0.75 80 80.75
4 1 Teaching 0.25 78 61.25
5 1 Teaching 0.50 99 11.75
6 2 Finance 0.50 75 75.50
7 2 Finance 0.25 56 61.50
8 2 Finance 0.25 80 41.50
9 2 Teaching 0.50 79 79.50
10 3 Finance 0.25 61 61.25
11 3 Finance 0.75 87 -4.00
12 3 Finance 0.75 97 -76.75
13 3 Finance 0.25 99 -101.50
14 3 Finance 0.25 76 -120.50
15 3 Teaching 0.25 73 73.25
16 3 Teaching 0.75 68 22.25
17 3 Teaching 0.25 59 7.50
18 3 Teaching 0.25 60 -7.50