country_name country_code val_code \
United States of America 231 1
United States of America 231 2
United States of America 231 3
United States of America 231 4
United States of America 231 5
y191 y192 y193 y194 y195 \
47052179 43361966 42736682 43196916 41751928
1187385 1201557 1172941 1176366 1192173
28211467 27668273 29742374 27543836 28104317
179000 193000 233338 276639 249688
12613922 12864425 13240395 14106139 15642337
在上面的数据框中,我想为每一行计算该val_code占用的总百分比,从而得到foll。数据框。
即。总结每一行并除以所有行的总和
country_name country_code val_code \
United States of America 231 1
United States of America 231 2
United States of America 231 3
United States of America 231 4
United States of America 231 5
perc
50.14947129
1.363631254
32.48344744
0.260213146
15.74323688
现在,我正在这样做,但它无法正常工作
grp_df = df.groupby(['country_name', 'val_code']).agg()
pct_df = grp_df.groupby(level=0).apply(lambda x: 100*x/float(x.sum()))
答案 0 :(得分:14)
您可以使用lambda
函数获取每列的百分比,如下所示:
>>> df.iloc[:, 3:].apply(lambda x: x / x.sum())
y191 y192 y193 y194 y195
0 0.527231 0.508411 0.490517 0.500544 0.480236
1 0.013305 0.014088 0.013463 0.013631 0.013713
2 0.316116 0.324405 0.341373 0.319164 0.323259
3 0.002006 0.002263 0.002678 0.003206 0.002872
4 0.141342 0.150833 0.151969 0.163455 0.179920
您的示例没有val_code
的任何重复值,因此我不确定您希望数据显示的方式(即显示每个v val_code
组的列总数与总数的百分比。)
答案 1 :(得分:2)
Ge所有感兴趣列的总数,然后添加百分比列:
In [35]:
total = np.sum(df.ix[:,'y191':].values)
df['percent'] = df.ix[:,'y191':].sum(axis=1)/total * 100
df
Out[35]:
country_name country_code val_code y191 y192 \
0 United States of America 231 1 47052179 43361966
1 United States of America 231 1 1187385 1201557
2 United States of America 231 1 28211467 27668273
3 United States of America 231 1 179000 193000
4 United States of America 231 1 12613922 12864425
y193 y194 y195 percent
0 42736682 43196916 41751928 50.149471
1 1172941 1176366 1192173 1.363631
2 29742374 27543836 28104317 32.483447
3 233338 276639 249688 0.260213
4 13240395 14106139 15642337 15.743237
因此np.sum
将汇总所有值:
In [32]:
total = np.sum(df.ix[:,'y191':].values)
total
Out[32]:
434899243
然后我们在感兴趣的col上调用.sum(axis=1)/total * 100
来逐行求和,除以总数并乘以100得到一个百分比。