根据正则表达式将所选列更改为百分比

时间:2018-03-07 06:52:07

标签: python pandas numpy data-munging

我有以下数据:

Shiver  Shiver - H2O -Multi Shiver-SHOWME -Multi    Shiver - STARWARS - Multi   Shiver - UFC - Multi    Total   no_of_prize
3.30    46.94               14.83                   10.76                       0.00                    575.30  3
4.01    21.31               21.28                   10.77                       0.00                    568.16  3
1.24    0.15                0.46                    0.00                        0.00                    482.74  2
0.90    0.43                0.94                    0.00                        0.00                    128.38  2
2.91    2.93                0.75                    0.86                        0.00                    265.52  3
0.31    8.09                4.70                    0.72                        0.00                    117.81  3
0.25    0.83                2.12                    0.00                        0.00                    55.84   2
0.04    2.43                1.39                    1.64                        0.00                    457.23  3
5.95    12.28               4.33                    5.73                        0.00                    540.54  3

对于每个列(Total和no_of_prize除外)将更改为Total的百分比。 现在列数可能因数据而异,因此我想使用正则表达式来识别像Shiver和Shiver - Multi这样的列。 例如,另一个数据可能是

恐怖恐怖 - FRANKESTIEN - Multi Total no_of_prize

我想通过使用正则表达式或任何其他可行方式自动处理此问题。

预期产出:

Shiver  Shiver - H2O -Multi Shiver-SHOWME -Multi    Shiver - STARWARS - Multi   Shiver - UFC - Multi    Total   no_of_prem
0.57%   8.16%               2.58%                   1.87%                       0.00%                   575.30  3
0.71%   3.75%               3.75%                   1.89%                       0.00%                   568.16  3
0.26%   0.03%               0.09%                   0.00%                       0.00%                   482.74  2
0.71%   0.34%               0.73%                   0.00%                       0.00%                   128.38  2
1.10%   1.10%               0.28%                   0.33%                       0.00%                   265.52  3
0.27%   6.87%               3.99%                   0.61%                       0.00%                   117.81  3
0.46%   1.48%               3.80%                   0.00%                       0.00%                   55.84   2
0.01%   0.53%               0.30%                   0.36%                       0.00%                   457.23  3
1.10%   2.27%               0.80%                   1.06%                       0.00%                   540.54  3

有人可以帮帮我吗?

1 个答案:

答案 0 :(得分:1)

首先按difference过滤掉不必要的列,然后按div除以Total列并分配回来:

cols = df.columns.difference(['Total','no_of_prize'])
print (cols)
Index(['Shiver', 'Shiver - H2O -Multi', 'Shiver - STARWARS - Multi',
       'Shiver - UFC - Multi', 'Shiver-SHOWME -Multi'],
      dtype='object')

df[cols] = df[cols].div(df['Total'], axis=0).mul(100)

替代解决方案是boolean mask使用isin并按~反转 - 需要所有列Totalno_of_prize

col_mask = ~df.columns.isin(['Total','no_of_prize'])
print (col_mask)
[ True  True  True  True  True False False]

df.loc[:, col_mask] = df.loc[:, col_mask].div(df['Total'], axis=0).mul(100)
print (df)
     Shiver  Shiver - H2O -Multi  Shiver-SHOWME -Multi  \
0  0.573614             8.159221              2.577786   
1  0.705787             3.750704              3.745424   
2  0.256867             0.031073              0.095289   
3  0.701044             0.334943              0.732201   
4  1.095963             1.103495              0.282465   
5  0.263136             6.866989              3.989475   
6  0.447708             1.486390              3.796562   
7  0.008748             0.531461              0.304005   
8  1.100751             2.271802              0.801051   

   Shiver - STARWARS - Multi  Shiver - UFC - Multi   Total  no_of_prize  
0                   1.870329                   0.0  575.30            3  
1                   1.895593                   0.0  568.16            3  
2                   0.000000                   0.0  482.74            2  
3                   0.000000                   0.0  128.38            2  
4                   0.323893                   0.0  265.52            3  
5                   0.611154                   0.0  117.81            3  
6                   0.000000                   0.0   55.84            2  
7                   0.358682                   0.0  457.23            3  
8                   1.060051                   0.0  540.54            3 

将数字列转换为字符串有两种方法:

如果.使用format之后需要使用两位数的百分比:

cols = df.columns.difference(['Total','no_of_prize'])
df[cols] = df[cols].div(df['Total'], axis=0).mul(100).round(2).applymap('{:,.2f}%'.format)
print (df)
  Shiver Shiver - H2O -Multi Shiver-SHOWME -Multi Shiver - STARWARS - Multi  \
0  0.57%               8.16%                2.58%                     1.87%   
1  0.71%               3.75%                3.75%                     1.90%   
2  0.26%               0.03%                0.10%                     0.00%   
3  0.70%               0.33%                0.73%                     0.00%   
4  1.10%               1.10%                0.28%                     0.32%   
5  0.26%               6.87%                3.99%                     0.61%   
6  0.45%               1.49%                3.80%                     0.00%   
7  0.01%               0.53%                0.30%                     0.36%   
8  1.10%               2.27%                0.80%                     1.06%   

  Shiver - UFC - Multi   Total  no_of_prize  
0                0.00%  575.30            3  
1                0.00%  568.16            3  
2                0.00%  482.74            2  
3                0.00%  128.38            2  
4                0.00%  265.52            3  
5                0.00%  117.81            3  
6                0.00%   55.84            2  
7                0.00%  457.23            3  
8                0.00%  540.54            3   

或者如果不需要两位数:

df[cols] = df[cols].div(df['Total'], axis=0).mul(100).round(2).astype(str) + '%'
print (df)
  Shiver Shiver - H2O -Multi Shiver-SHOWME -Multi Shiver - STARWARS - Multi  \
0  0.57%               8.16%                2.58%                     1.87%   
1  0.71%               3.75%                3.75%                      1.9%   
2  0.26%               0.03%                 0.1%                      0.0%   
3   0.7%               0.33%                0.73%                      0.0%   
4   1.1%                1.1%                0.28%                     0.32%   
5  0.26%               6.87%                3.99%                     0.61%   
6  0.45%               1.49%                 3.8%                      0.0%   
7  0.01%               0.53%                 0.3%                     0.36%   
8   1.1%               2.27%                 0.8%                     1.06%   

  Shiver - UFC - Multi   Total  no_of_prize  
0                 0.0%  575.30            3  
1                 0.0%  568.16            3  
2                 0.0%  482.74            2  
3                 0.0%  128.38            2  
4                 0.0%  265.52            3  
5                 0.0%  117.81            3  
6                 0.0%   55.84            2  
7                 0.0%  457.23            3  
8                 0.0%  540.54            3