熊猫基于行计算百分比

时间:2020-10-21 08:32:01

标签: pandas

我的数据框看起来像这样:

import pandas as pd

ratings = {'rating': ['1','2','3','4', '5'],
        'F': [6,4,6,4,8],
        'M': [4,6,14,6,2]   
        }

df = pd.DataFrame(ratings, columns = ['rating', 'F','M'])

print (df)

   rating  F   M
        1  6   4
        2  4   6
        3  6  14
        4  4   6
        5  8   2

我想做的是获得两个新列F_percentage和M_percentage。我希望它们将包含F和M列中每个数字在每行计算的整体中所占的百分比。含义:

 rating      F      M   F_percentage     M_percentage
      1      6      4        60%            40%
      2      4      6        40%            60%
      3      6      14       ........
      4      4      6        ........
      5      8      2        80%            20%

我想计算每行占总数的百分比。

提前谢谢!

3 个答案:

答案 0 :(得分:1)

如果性能很重要,则可以将列的总和除以DataFrame.div,然后将其总和除以join

df1 = df[['F','M']]
df = df.join(df1.div(df1.sum(axis=1), axis=0).add_suffix('_percentage').mul(100))
print (df)
  rating  F   M  F_percentage  M_percentage
0      1  6   4          60.0          40.0
1      2  4   6          40.0          60.0
2      3  6  14          30.0          70.0
3      4  4   6          40.0          60.0
4      5  8   2          80.0          20.0

如果需要带有%的字符串将值转换为字符串,请删除可能的.0值并最后添加百分比:

df1 = df[['F','M']]
df = (df.join(df1.div(df1.sum(axis=1), axis=0)
                         .add_suffix('_percentage').mul(100)
                         .astype(int)
                         .astype(str)
                         .replace('\.0','', regex=True)
                         .add('%')))
print (df)
  rating  F   M F_percentage M_percentage
0      1  6   4          60%          40%
1      2  4   6          40%          60%
2      3  6  14          30%          70%
3      4  4   6          40%          60%
4      5  8   2          80%          20%

答案 1 :(得分:1)

您可以创建一个函数来完成上述任务,并使用数据框apply方法对其应用

# female percentage
def f_percentage(row):
   tot = row['F'] + row['M']
   return str(int((row['F'] / tot) * 100))+'%'

df['F_percentage'] = df.apply(lambda row: f_percentage(row), axis=1)

# male percentage
def m_percentage(row):
    tot = row['F'] + row['M']
    return str(int((row['M'] / tot) * 100))+'%'

df['M_percentage'] = df.apply(lambda row: m_percentage(row), axis=1)
如其他用户指出的,

apply方法存在性能问题。但是,如果数据帧很小,则无关紧要。顺便说一句,请牢记这一点很重要,例如考虑一下数据框在不久的将来是否会增加大小

答案 2 :(得分:1)

这是为您提供的完整解决方案-

import pandas as pd

percentage_F_list = []
percentage_M_list = []

ratings = {'rating': ['1','2','3','4', '5'],
        'F': [6,4,6,4,8],
        'M': [4,6,14,6,2]   
        }

df = pd.DataFrame(ratings, columns = ['rating', 'F','M'])

print (df)


for i in range(df.shape[0]):
    tot = df['F'][i] + df['M'][i]
    percentage_F_list.append((df['F'][i])/tot * 100)
    percentage_M_list.append((df['M'][i])/tot * 100)
    
df['F_percentage'] = percentage_F_list
df['M_percentage'] = percentage_M_list