我想在下面的数据集中使用applymap方法和一些复杂的功能。
value1 value2 value3 value4 value5 people
147 119 69 92 106 533.0
31 20 12 14 26 103.0
37 22 24 18 19 120.0
10 13 7 13 10 53.0
38 48 18 30 27 161.0
401 409 168 354 338 1670.0
109 92 55 82 69 407.0
5 9 7 11 9 41.0
44 36 21 48 28 177.0
59 40 19 38 27 183.0
8 9 1 7 10 35.0
People列表示值列的总和。我想用百分比替换值数字。 例如:在第一行中,value1为147,第一行中的值之和为533。我想用(147/533)* 100
替换147我认为它看起来像这样。但我无法使其正常工作。
df.loc[:, 'value1':'value5'] = df.loc[:, 'value1':'value5'].applymap(lambda x: (x / df['people'])*100)
答案 0 :(得分:2)
函数applymap
用于按元素顺序处理DataFrame
的每个值。
更好的是将向量化解决方案与DataFrame.div
配合使用:
df.loc[:, 'value1':'value5'] = df.loc[:, 'value1':'value5'].div(df['people'], axis=0) * 100
print (df)
value1 value2 value3 value4 value5 people
0 27.579737 22.326454 12.945591 17.260788 19.887430 533.0
1 30.097087 19.417476 11.650485 13.592233 25.242718 103.0
2 30.833333 18.333333 20.000000 15.000000 15.833333 120.0
3 18.867925 24.528302 13.207547 24.528302 18.867925 53.0
4 23.602484 29.813665 11.180124 18.633540 16.770186 161.0
5 24.011976 24.491018 10.059880 21.197605 20.239521 1670.0
6 26.781327 22.604423 13.513514 20.147420 16.953317 407.0
7 12.195122 21.951220 17.073171 26.829268 21.951220 41.0
8 24.858757 20.338983 11.864407 27.118644 15.819209 177.0
9 32.240437 21.857923 10.382514 20.765027 14.754098 183.0
10 22.857143 25.714286 2.857143 20.000000 28.571429 35.0
另一种具有广播功能的numpy
解决方案:
df.loc[:, 'value1':'value5'] = (df.loc[:, 'value1':'value5'].values /
df['people'].values[:, None] * 100)
print (df)
value1 value2 value3 value4 value5 people
0 27.579737 22.326454 12.945591 17.260788 19.887430 533.0
1 30.097087 19.417476 11.650485 13.592233 25.242718 103.0
2 30.833333 18.333333 20.000000 15.000000 15.833333 120.0
3 18.867925 24.528302 13.207547 24.528302 18.867925 53.0
4 23.602484 29.813665 11.180124 18.633540 16.770186 161.0
5 24.011976 24.491018 10.059880 21.197605 20.239521 1670.0
6 26.781327 22.604423 13.513514 20.147420 16.953317 407.0
7 12.195122 21.951220 17.073171 26.829268 21.951220 41.0
8 24.858757 20.338983 11.864407 27.118644 15.819209 177.0
9 32.240437 21.857923 10.382514 20.765027 14.754098 183.0
10 22.857143 25.714286 2.857143 20.000000 28.571429 35.0
如果想要类似applymap
之类的东西,可以使用apply
,但上述解决方案更快:
df.loc[:, 'value1':'value5'] = )df.loc[:, 'value1':'value5']
.apply(lambda x: (x / df['people'])*100))
print (df)
value1 value2 value3 value4 value5 people
0 27.579737 22.326454 12.945591 17.260788 19.887430 533.0
1 30.097087 19.417476 11.650485 13.592233 25.242718 103.0
2 30.833333 18.333333 20.000000 15.000000 15.833333 120.0
3 18.867925 24.528302 13.207547 24.528302 18.867925 53.0
4 23.602484 29.813665 11.180124 18.633540 16.770186 161.0
5 24.011976 24.491018 10.059880 21.197605 20.239521 1670.0
6 26.781327 22.604423 13.513514 20.147420 16.953317 407.0
7 12.195122 21.951220 17.073171 26.829268 21.951220 41.0
8 24.858757 20.338983 11.864407 27.118644 15.819209 177.0
9 32.240437 21.857923 10.382514 20.765027 14.754098 183.0
10 22.857143 25.714286 2.857143 20.000000 28.571429 35.0