我的数据框看起来像这样:
import pandas as pd
ratings = {'rating': ['1','2','3','4', '5'],
'F': [6,4,6,4,8],
'M': [4,6,14,6,2]
}
df = pd.DataFrame(ratings, columns = ['rating', 'F','M'])
print (df)
rating F M
1 6 4
2 4 6
3 6 14
4 4 6
5 8 2
我想做的是获得两个新列F_percentage和M_percentage。我希望它们将包含F和M列中每个数字在每行计算的整体中所占的百分比。含义:
rating F M F_percentage M_percentage
1 6 4 60% 40%
2 4 6 40% 60%
3 6 14 ........
4 4 6 ........
5 8 2 80% 20%
我想计算每行占总数的百分比。
提前谢谢!
答案 0 :(得分:1)
如果性能很重要,则可以将列的总和除以DataFrame.div
,然后将其总和除以join
:
df1 = df[['F','M']]
df = df.join(df1.div(df1.sum(axis=1), axis=0).add_suffix('_percentage').mul(100))
print (df)
rating F M F_percentage M_percentage
0 1 6 4 60.0 40.0
1 2 4 6 40.0 60.0
2 3 6 14 30.0 70.0
3 4 4 6 40.0 60.0
4 5 8 2 80.0 20.0
如果需要带有%
的字符串将值转换为字符串,请删除可能的.0
值并最后添加百分比:
df1 = df[['F','M']]
df = (df.join(df1.div(df1.sum(axis=1), axis=0)
.add_suffix('_percentage').mul(100)
.astype(int)
.astype(str)
.replace('\.0','', regex=True)
.add('%')))
print (df)
rating F M F_percentage M_percentage
0 1 6 4 60% 40%
1 2 4 6 40% 60%
2 3 6 14 30% 70%
3 4 4 6 40% 60%
4 5 8 2 80% 20%
答案 1 :(得分:1)
您可以创建一个函数来完成上述任务,并使用数据框apply
方法对其应用
# female percentage
def f_percentage(row):
tot = row['F'] + row['M']
return str(int((row['F'] / tot) * 100))+'%'
df['F_percentage'] = df.apply(lambda row: f_percentage(row), axis=1)
# male percentage
def m_percentage(row):
tot = row['F'] + row['M']
return str(int((row['M'] / tot) * 100))+'%'
df['M_percentage'] = df.apply(lambda row: m_percentage(row), axis=1)
如其他用户指出的, apply
方法存在性能问题。但是,如果数据帧很小,则无关紧要。顺便说一句,请牢记这一点很重要,例如考虑一下数据框在不久的将来是否会增加大小
答案 2 :(得分:1)
这是为您提供的完整解决方案-
import pandas as pd
percentage_F_list = []
percentage_M_list = []
ratings = {'rating': ['1','2','3','4', '5'],
'F': [6,4,6,4,8],
'M': [4,6,14,6,2]
}
df = pd.DataFrame(ratings, columns = ['rating', 'F','M'])
print (df)
for i in range(df.shape[0]):
tot = df['F'][i] + df['M'][i]
percentage_F_list.append((df['F'][i])/tot * 100)
percentage_M_list.append((df['M'][i])/tot * 100)
df['F_percentage'] = percentage_F_list
df['M_percentage'] = percentage_M_list