熊猫:计算多列百分比值

时间:2016-06-27 13:18:09

标签: python pandas

我未能遍历select数据帧列的值以创建表示百分比值的新列。可重复的例子:

    data = {'Respondents': [90, 43, 89, '89', '67', '88', '73', '78', '62', '101'],
        'answer_1': [51, 15, 15, 61, 16, 14, 15, 1, 0, 16], 
        'answer_2': [11, 12, 14, 40, 36, 78, 12, 0, 26, 78],
        'answer_3': [3, 8, 4, 0, 2, 7, 10, 11, 6, 7]}
df = pd.DataFrame(data)
df

    Respondents  answer_1   answer_2   answer_3
0   90           51         11         3
1   43           15         12         8
2   89           15         14         4
3   89           61         35         0
4   67           16         36         2
5   88           14         78         7
6   73           15         12         10
7   78           1          0          11
8   62           0          26         6
9   101          16         78         7

目的是计算每个答案列与总受访者的百分比。例如,对于新的answer_1列,我们将其命名为answer_1_perc - 第一个值为46(因为51为90%的46%),下一个值为35(15是43%的35%。然后会有answer_2_percanswer_3_perc列。

我已经写了很多以下代码的迭代,我的头脑正在旋转。

for columns in df.iloc[:, 1:4]:
for i in columns: 
    i_name = 'percentage_' + str(columns)
    i_group = ([i] / df['Respondents'] * 100)
    df[i_name] = i_group

最好的方法是什么?我需要使用迭代方法,因为我的实际数据有25个答案列,而不是本例中显示的3个。

3 个答案:

答案 0 :(得分:4)

你差不多了,请注意你在受访者col中有字符串值,我在调用以下内容之前已经纠正过:

In [172]:

for col in df.columns[1:4]:
    i_name = 'percentage_' + col
    i_group = (df[col] / df['Respondents']) * 100
    df[i_name] = i_group

df
Out[172]:
   Respondents  answer_1  answer_2  answer_3  percentage_answer_1  \
0           90        51        11         3            56.666667   
1           43        15        12         8            34.883721   
2           89        15        14         4            16.853933   
3           89        61        40         0            68.539326   
4           67        16        36         2            23.880597   
5           88        14        78         7            15.909091   
6           73        15        12        10            20.547945   
7           78         1         0        11             1.282051   
8           62         0        26         6             0.000000   
9          101        16        78         7            15.841584   

   percentage_answer_2  percentage_answer_3  
0            12.222222             3.333333  
1            27.906977            18.604651  
2            15.730337             4.494382  
3            44.943820             0.000000  
4            53.731343             2.985075  
5            88.636364             7.954545  
6            16.438356            13.698630  
7             0.000000            14.102564  
8            41.935484             9.677419  
9            77.227723             6.930693  

答案 1 :(得分:2)

我建议使用div和concat:

df['Respondents'] = df['Respondents'].astype(float)
df_pct = (df.drop('Respondents', axis=1)
            .div(df['Respondents'], axis=0)
            .mul(100)
            .rename(columns=lambda col: 'percentage_' + col)
          )
pd.concat([df, df_pct], axis=1)

   Respondents  answer_1  answer_2  answer_3  percentage_answer_1  \
0         90.0        51        11         3            56.666667   
1         43.0        15        12         8            34.883721   
2         89.0        15        14         4            16.853933   
3         89.0        61        40         0            68.539326   
4         67.0        16        36         2            23.880597   
5         88.0        14        78         7            15.909091   
6         73.0        15        12        10            20.547945   
7         78.0         1         0        11             1.282051   
8         62.0         0        26         6             0.000000   
9        101.0        16        78         7            15.841584   

   percentage_answer_2  percentage_answer_3  
0            12.222222             3.333333  
1            27.906977            18.604651  
2            15.730337             4.494382  
3            44.943820             0.000000  
4            53.731343             2.985075  
5            88.636364             7.954545  
6            16.438356            13.698630  
7             0.000000            14.102564  
8            41.935484             9.677419  
9            77.227723             6.930693  

答案 2 :(得分:0)

另一个解决方案divRespondents所需的列,然后添加到新列名称:

print  ('percentage_' + df.columns[1:4])
Index(['percentage_answer_1', 'percentage_answer_2', 'percentage_answer_3'], dtype='object')

df['percentage_' + df.columns[1:4]] = df.ix[:,1:4].div(df.Respondents, axis=0) * 100
print (df)
   Respondents  answer_1  answer_2  answer_3  percentage_answer_1  \
0           90        51        11         3            56.666667   
1           43        15        12         8            34.883721   
2           89        15        14         4            16.853933   
3           89        61        40         0            68.539326   
4           67        16        36         2            23.880597   
5           88        14        78         7            15.909091   
6           73        15        12        10            20.547945   
7           78         1         0        11             1.282051   
8           62         0        26         6             0.000000   
9          101        16        78         7            15.841584   

   percentage_answer_2  percentage_answer_3  
0            12.222222             3.333333  
1            27.906977            18.604651  
2            15.730337             4.494382  
3            44.943820             0.000000  
4            53.731343             2.985075  
5            88.636364             7.954545  
6            16.438356            13.698630  
7             0.000000            14.102564  
8            41.935484             9.677419  
9            77.227723             6.930693