我有一个这样的熊猫DataFrame:
subject bool Count
1 False 329232
1 True 73896
2 False 268338
2 True 76424
3 False 186167
3 True 27078
4 False 172417
4 True 113268
我想将Count
转换为每个主题组的百分比。因此,例如,第1行将是329232 / (329232 + 73896) = 0.816
,第2行将是73896/ (329232 + 73896) = 0.183
。然后,第2组的总数将发生变化,依此类推。
这可以由groupby来完成吗?我尝试遍历行,但收效甚微。
答案 0 :(得分:2)
这对我有用:
df['Count'] = df['Count'].div(df.groupby('subject')['Count'].transform(lambda x: x.sum()))
print(df)
礼物:
Count bool subject
0 0.816693 False 1
1 0.183307 True 1
2 0.778328 False 2
3 0.221672 True 2
4 0.873019 False 3
5 0.126981 True 3
6 0.603521 False 4
7 0.396479 True 4
答案 1 :(得分:1)
我的解决方案是这样的:
导入相关库
import pandas as pd
import numpy as np
创建数据框 df
d = {'subject':[1,1,2,2,3,3],'bool':[False,True,False,True,False,True],
'count':[329232,73896,268338,76424,186167,27078]}
df = pd.DataFrame(d)
使用 groupby
和reset_index
table_sum= df.groupby('subject').sum().reset_index()[['subject','count']]
压缩 groupby
输出并将其设置为 dictionary
并使用地图获取频率
look_1 = (dict(zip(table_sum['subject'],table_sum['count'])))
df['cu_sum'] = df['subject'].map(look_1)
df['relative_frequency'] = df['count']/df['cu_sum']
输出
print(df)
subject bool count cu_sum relative_frequency
0 1 False 329232 403128 0.816693
1 1 True 73896 403128 0.183307
2 2 False 268338 344762 0.778328
3 2 True 76424 344762 0.221672
4 3 False 186167 213245 0.873019
5 3 True 27078 213245 0.126981
答案 2 :(得分:-1)
#create df
d = {'subject': [1, 1, 2, 2, 3, 3, 4, 4], 'bool': [False, True, False, True, False, True, False, True], 'Count': [329232,73896
,268338,76424,186167,27078,172417,113268]}
df = pd.DataFrame(d)
#get sums for each subject group
sums = pd.DataFrame(df.groupby(['subject'])['Count'].sum().reset_index())
sums.columns = ['subject', 'sums']
#merge sums to original df
df_sums = df.merge(sums, how='left', on='subject')
#calculate percentages for each row
df_sums['percent'] = df_sums['Count']/df_sums['sums']
df_sums