以传递的列名作为参数的Pandas applymap方法

时间:2018-07-27 12:53:02

标签: python pandas lambda python-applymap

我想在下面的数据集中使用applymap方法和一些复杂的功能。

 value1 value2 value3 value4 value5  people

   147    119     69     92    106   533.0
    31     20     12     14     26   103.0
    37     22     24     18     19   120.0
    10     13      7     13     10    53.0
    38     48     18     30     27   161.0
   401    409    168    354    338  1670.0
   109     92     55     82     69   407.0
     5      9      7     11      9    41.0
    44     36     21     48     28   177.0
    59     40     19     38     27   183.0
     8      9      1      7     10    35.0

People列表示值列的总和。我想用百分比替换值数字。 例如:在第一行中,value1为147,第一行中的值之和为533。我想用(147/533)* 100

替换147

我认为它看起来像这样。但我无法使其正常工作。

df.loc[:, 'value1':'value5'] = df.loc[:, 'value1':'value5'].applymap(lambda x: (x / df['people'])*100)

1 个答案:

答案 0 :(得分:2)

函数applymap用于按元素顺序处理DataFrame的每个值。

更好的是将向量化解决方案与DataFrame.div配合使用:

df.loc[:, 'value1':'value5'] = df.loc[:, 'value1':'value5'].div(df['people'], axis=0) * 100
print (df)
       value1     value2     value3     value4     value5  people
0   27.579737  22.326454  12.945591  17.260788  19.887430   533.0
1   30.097087  19.417476  11.650485  13.592233  25.242718   103.0
2   30.833333  18.333333  20.000000  15.000000  15.833333   120.0
3   18.867925  24.528302  13.207547  24.528302  18.867925    53.0
4   23.602484  29.813665  11.180124  18.633540  16.770186   161.0
5   24.011976  24.491018  10.059880  21.197605  20.239521  1670.0
6   26.781327  22.604423  13.513514  20.147420  16.953317   407.0
7   12.195122  21.951220  17.073171  26.829268  21.951220    41.0
8   24.858757  20.338983  11.864407  27.118644  15.819209   177.0
9   32.240437  21.857923  10.382514  20.765027  14.754098   183.0
10  22.857143  25.714286   2.857143  20.000000  28.571429    35.0

另一种具有广播功能的numpy解决方案:

df.loc[:, 'value1':'value5'] = (df.loc[:, 'value1':'value5'].values / 
                                     df['people'].values[:, None] * 100)
print (df)
       value1     value2     value3     value4     value5  people
0   27.579737  22.326454  12.945591  17.260788  19.887430   533.0
1   30.097087  19.417476  11.650485  13.592233  25.242718   103.0
2   30.833333  18.333333  20.000000  15.000000  15.833333   120.0
3   18.867925  24.528302  13.207547  24.528302  18.867925    53.0
4   23.602484  29.813665  11.180124  18.633540  16.770186   161.0
5   24.011976  24.491018  10.059880  21.197605  20.239521  1670.0
6   26.781327  22.604423  13.513514  20.147420  16.953317   407.0
7   12.195122  21.951220  17.073171  26.829268  21.951220    41.0
8   24.858757  20.338983  11.864407  27.118644  15.819209   177.0
9   32.240437  21.857923  10.382514  20.765027  14.754098   183.0
10  22.857143  25.714286   2.857143  20.000000  28.571429    35.0

如果想要类似applymap之类的东西,可以使用apply,但上述解决方案更快:

df.loc[:, 'value1':'value5'] = )df.loc[:, 'value1':'value5']
                                   .apply(lambda x: (x / df['people'])*100))
print (df)
       value1     value2     value3     value4     value5  people
0   27.579737  22.326454  12.945591  17.260788  19.887430   533.0
1   30.097087  19.417476  11.650485  13.592233  25.242718   103.0
2   30.833333  18.333333  20.000000  15.000000  15.833333   120.0
3   18.867925  24.528302  13.207547  24.528302  18.867925    53.0
4   23.602484  29.813665  11.180124  18.633540  16.770186   161.0
5   24.011976  24.491018  10.059880  21.197605  20.239521  1670.0
6   26.781327  22.604423  13.513514  20.147420  16.953317   407.0
7   12.195122  21.951220  17.073171  26.829268  21.951220    41.0
8   24.858757  20.338983  11.864407  27.118644  15.819209   177.0
9   32.240437  21.857923  10.382514  20.765027  14.754098   183.0
10  22.857143  25.714286   2.857143  20.000000  28.571429    35.0