已准备以下数据。
group score
a 1 100
b 2 80
c 2 75
d 2 65
e 2 55
f 3 45
g 3 30
h 4 1
我想对每个使用熊猫的群体使用概率。我要取得如下结果。
group score first second third fourth sum
a 1 100 100% 27% 22% 22% 171%
b 2 80 0% 21% 18% 17% 57%
c 2 75 0% 20% 17% 16% 53%
d 2 65 0% 17% 14% 14% 46%
e 2 55 0% 15% 12% 12% 39%
f 3 45 0% 0% 10% 10% 20%
g 3 30 0% 0% 7% 7% 13%
h 4 1 0% 0% 0% 2% 2%
它可与以下程序一起使用,但是有更好的方法吗?
df_second = df[df['group'] <= 2]['score'].to_frame('score')
df_second['second'] = df_second / df_second.sum()
del df_second['score']
df.join(df_second)
答案 0 :(得分:1)
我认为需要循环:
for i in df['group'].unique():
df[i] = (df['score'] / df.loc[df['group'] <= i, 'score'].sum()) * 100
df['sum'] = df.iloc[:, 2:].sum(axis=1)
print (df)
group score 1 2 3 4 sum
a 1 100 100.0 26.666667 22.222222 22.172949 171.061838
b 2 80 80.0 21.333333 17.777778 17.738359 136.849470
c 2 75 75.0 20.000000 16.666667 16.629712 128.296378
d 2 65 65.0 17.333333 14.444444 14.412417 111.190195
e 2 55 55.0 14.666667 12.222222 12.195122 94.084011
f 3 45 45.0 12.000000 10.000000 9.977827 76.977827
g 3 30 30.0 8.000000 6.666667 6.651885 51.318551
h 4 1 1.0 0.266667 0.222222 0.221729 1.710618
具有列表理解功能的另一种解决方案:
arr = df['group'].unique()
comp = [(df['score'] / df.loc[df['group'] <= i, 'score'].sum()) * 100 for i in arr]
df1 = pd.concat(comp, axis=1, keys=arr)
df1['sum'] = df1.sum(axis=1)
print (df1)
1 2 3 4 sum
a 100.0 26.666667 22.222222 22.172949 171.061838
b 80.0 21.333333 17.777778 17.738359 136.849470
c 75.0 20.000000 16.666667 16.629712 128.296378
d 65.0 17.333333 14.444444 14.412417 111.190195
e 55.0 14.666667 12.222222 12.195122 94.084011
f 45.0 12.000000 10.000000 9.977827 76.977827
g 30.0 8.000000 6.666667 6.651885 51.318551
h 1.0 0.266667 0.222222 0.221729 1.710618
df = df.join(df1)
print (df)
group score 1 2 3 4 sum
a 1 100 100.0 26.666667 22.222222 22.172949 171.061838
b 2 80 80.0 21.333333 17.777778 17.738359 136.849470
c 2 75 75.0 20.000000 16.666667 16.629712 128.296378
d 2 65 65.0 17.333333 14.444444 14.412417 111.190195
e 2 55 55.0 14.666667 12.222222 12.195122 94.084011
f 3 45 45.0 12.000000 10.000000 9.977827 76.977827
g 3 30 30.0 8.000000 6.666667 6.651885 51.318551
h 4 1 1.0 0.266667 0.222222 0.221729 1.710618
答案 1 :(得分:0)
g = df.groupby('group')
g.apply(lambda x: x / x.sum())