我有以下数据:
Shiver Shiver - H2O -Multi Shiver-SHOWME -Multi Shiver - STARWARS - Multi Shiver - UFC - Multi Total no_of_prize
3.30 46.94 14.83 10.76 0.00 575.30 3
4.01 21.31 21.28 10.77 0.00 568.16 3
1.24 0.15 0.46 0.00 0.00 482.74 2
0.90 0.43 0.94 0.00 0.00 128.38 2
2.91 2.93 0.75 0.86 0.00 265.52 3
0.31 8.09 4.70 0.72 0.00 117.81 3
0.25 0.83 2.12 0.00 0.00 55.84 2
0.04 2.43 1.39 1.64 0.00 457.23 3
5.95 12.28 4.33 5.73 0.00 540.54 3
对于每个列(Total和no_of_prize除外)将更改为Total的百分比。 现在列数可能因数据而异,因此我想使用正则表达式来识别像Shiver和Shiver - Multi这样的列。 例如,另一个数据可能是
恐怖恐怖 - FRANKESTIEN - Multi Total no_of_prize
我想通过使用正则表达式或任何其他可行方式自动处理此问题。
预期产出:
Shiver Shiver - H2O -Multi Shiver-SHOWME -Multi Shiver - STARWARS - Multi Shiver - UFC - Multi Total no_of_prem
0.57% 8.16% 2.58% 1.87% 0.00% 575.30 3
0.71% 3.75% 3.75% 1.89% 0.00% 568.16 3
0.26% 0.03% 0.09% 0.00% 0.00% 482.74 2
0.71% 0.34% 0.73% 0.00% 0.00% 128.38 2
1.10% 1.10% 0.28% 0.33% 0.00% 265.52 3
0.27% 6.87% 3.99% 0.61% 0.00% 117.81 3
0.46% 1.48% 3.80% 0.00% 0.00% 55.84 2
0.01% 0.53% 0.30% 0.36% 0.00% 457.23 3
1.10% 2.27% 0.80% 1.06% 0.00% 540.54 3
有人可以帮帮我吗?
答案 0 :(得分:1)
首先按difference
过滤掉不必要的列,然后按div
除以Total
列并分配回来:
cols = df.columns.difference(['Total','no_of_prize'])
print (cols)
Index(['Shiver', 'Shiver - H2O -Multi', 'Shiver - STARWARS - Multi',
'Shiver - UFC - Multi', 'Shiver-SHOWME -Multi'],
dtype='object')
df[cols] = df[cols].div(df['Total'], axis=0).mul(100)
替代解决方案是boolean mask
使用isin
并按~
反转 - 需要所有列Total
和no_of_prize
:
col_mask = ~df.columns.isin(['Total','no_of_prize'])
print (col_mask)
[ True True True True True False False]
df.loc[:, col_mask] = df.loc[:, col_mask].div(df['Total'], axis=0).mul(100)
print (df)
Shiver Shiver - H2O -Multi Shiver-SHOWME -Multi \
0 0.573614 8.159221 2.577786
1 0.705787 3.750704 3.745424
2 0.256867 0.031073 0.095289
3 0.701044 0.334943 0.732201
4 1.095963 1.103495 0.282465
5 0.263136 6.866989 3.989475
6 0.447708 1.486390 3.796562
7 0.008748 0.531461 0.304005
8 1.100751 2.271802 0.801051
Shiver - STARWARS - Multi Shiver - UFC - Multi Total no_of_prize
0 1.870329 0.0 575.30 3
1 1.895593 0.0 568.16 3
2 0.000000 0.0 482.74 2
3 0.000000 0.0 128.38 2
4 0.323893 0.0 265.52 3
5 0.611154 0.0 117.81 3
6 0.000000 0.0 55.84 2
7 0.358682 0.0 457.23 3
8 1.060051 0.0 540.54 3
将数字列转换为字符串有两种方法:
如果.
使用format
之后需要使用两位数的百分比:
cols = df.columns.difference(['Total','no_of_prize'])
df[cols] = df[cols].div(df['Total'], axis=0).mul(100).round(2).applymap('{:,.2f}%'.format)
print (df)
Shiver Shiver - H2O -Multi Shiver-SHOWME -Multi Shiver - STARWARS - Multi \
0 0.57% 8.16% 2.58% 1.87%
1 0.71% 3.75% 3.75% 1.90%
2 0.26% 0.03% 0.10% 0.00%
3 0.70% 0.33% 0.73% 0.00%
4 1.10% 1.10% 0.28% 0.32%
5 0.26% 6.87% 3.99% 0.61%
6 0.45% 1.49% 3.80% 0.00%
7 0.01% 0.53% 0.30% 0.36%
8 1.10% 2.27% 0.80% 1.06%
Shiver - UFC - Multi Total no_of_prize
0 0.00% 575.30 3
1 0.00% 568.16 3
2 0.00% 482.74 2
3 0.00% 128.38 2
4 0.00% 265.52 3
5 0.00% 117.81 3
6 0.00% 55.84 2
7 0.00% 457.23 3
8 0.00% 540.54 3
或者如果不需要两位数:
df[cols] = df[cols].div(df['Total'], axis=0).mul(100).round(2).astype(str) + '%'
print (df)
Shiver Shiver - H2O -Multi Shiver-SHOWME -Multi Shiver - STARWARS - Multi \
0 0.57% 8.16% 2.58% 1.87%
1 0.71% 3.75% 3.75% 1.9%
2 0.26% 0.03% 0.1% 0.0%
3 0.7% 0.33% 0.73% 0.0%
4 1.1% 1.1% 0.28% 0.32%
5 0.26% 6.87% 3.99% 0.61%
6 0.45% 1.49% 3.8% 0.0%
7 0.01% 0.53% 0.3% 0.36%
8 1.1% 2.27% 0.8% 1.06%
Shiver - UFC - Multi Total no_of_prize
0 0.0% 575.30 3
1 0.0% 568.16 3
2 0.0% 482.74 2
3 0.0% 128.38 2
4 0.0% 265.52 3
5 0.0% 117.81 3
6 0.0% 55.84 2
7 0.0% 457.23 3
8 0.0% 540.54 3