在将其标记为重复项之前,我查看了以下内容: question1 question2 source3
对于每个农民,我正在尝试计算两件事: 1)是水果x的成熟水果的百分比:%(成熟水果x)/(总成熟水果) 2)水果x的成熟水果百分比:%(成熟的水果x)/(总水果x)
基于成熟水果指标(1表示成熟,0表示不成熟)。
输入:
df = pd.DataFrame({'Farmer': ['Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Sallys','Tims','Tims','Tims','Tims'],
'Fruit':['Apple','Apple','Apple','Grape','Grape','Grape','Grape','Cherry','Cherry','Cherry','Cherry','Cherry','Cherry','Cherry','Cherry'],
'Type': ['Red','Yellow','Green','Red seedless','Red with seeds','Green','Purple','Montmorency','Morello','Bing','Rainer','Montmorency','Morello','Bing','Rainer'],
'Number':[2,6,2,1,1,6,2,3,1,3,3,3,1,3,3],
'Ripe':[1,1,0,1,0,1,1,0,0,0,1,0,0,0,1]})
df
Farmer Fruit Number Ripe Type
0 Sallys Apple 2 1 Red
1 Sallys Apple 6 1 Yellow
2 Sallys Apple 2 0 Green
3 Sallys Grape 1 1 Red seedless
4 Sallys Grape 1 0 Red with seeds
5 Sallys Grape 6 1 Green
6 Sallys Grape 2 1 Purple
7 Sallys Cherry 3 0 Montmorency
8 Sallys Cherry 1 0 Morello
9 Sallys Cherry 3 0 Bing
10 Sallys Cherry 3 1 Rainer
11 Tims Cherry 3 0 Montmorency
12 Tims Cherry 1 0 Morello
13 Tims Cherry 3 0 Bing
14 Tims Cherry 3 1 Rainer
所需的输出:
Farmer Fruit %(ripe fruit x)/(total ripe fruit) %(ripe fruit x)/(total fruit x)
0 Sallys Apple 40 80
1 Sallys Grape 45 90
2 Sallys Cherry 15 30
3 Tims Cherry 100 30
答案 0 :(得分:2)
首先聚合sum
并用unstack
重塑,然后用sum
除以div
:
df1 = df.groupby(['Farmer','Fruit','Ripe'], sort=False)['Number'].sum().unstack()
a = df1[1].div(df1[1].sum(level=0)).mul(100)
b = df1[1].div(df1.sum(axis=1)).mul(100)
keys = ('%(ripe fruit x)/(total ripe fruit)','%(ripe fruit x)/(total fruit x)')
df2 = pd.concat([a,b], axis=1, keys=keys).reset_index()
print (df2)
Farmer Fruit %(ripe fruit x)/(total ripe fruit) \
0 Sallys Apple 40.0
1 Sallys Grape 45.0
2 Sallys Cherry 15.0
3 Tims Cherry 100.0
%(ripe fruit x)/(total fruit x)
0 80.0
1 90.0
2 30.0
3 30.0