我有以下数据框:
# Create a dataframe
raw_data = {'trial_num': ['1', '1', '2', '2', '3', '3'],
'area': ['first', 'second', 'first', 'second','first','second'],
'counts': [10, 25, 36, 2, 70, 10]}
df = pd.DataFrame(raw_data, columns = ['trial_num', 'area', 'counts'])
trial_num area count
0 1 first 10
1 1 second 25
2 2 first 36
3 2 second 2
4 3 first 70
5 3 second 10
我想添加一个新列“比例”,表示每个计数占每个“区域”总数的比例。像这样:
trial_num area count total_count proportion
0 1 first 10 35 0.2857142857142857
1 1 second 25 35 0.7142857142857143
2 2 first 36 38 0.9473684210526315
3 2 second 2 38 0.05263157894736842
4 3 first 70 80 0.875
5 3 second 10 80 0.125
我只有这么远:
df.counts.groupby(df.trial_num).sum()
trial_num
1 35
2 38
3 80
有没有一种有效的方法可以在不破坏数据框的情况下执行此操作?请帮忙。
答案 0 :(得分:1)
您可以按div
创建的Series
除GroupBy.transform
,其尺寸与原始df
相同:
df['proportion'] = df['counts'].div(df.groupby(['trial_num'])['counts'].transform('sum'))
替代map
:
s = df.groupby(['trial_num'])['counts'].sum()
df['proportion'] = df['counts'].div(df['trial_num'].map(s))
print (df)
trial_num area counts proportion
0 1 first 10 0.285714
1 1 second 25 0.714286
2 2 first 36 0.947368
3 2 second 2 0.052632
4 3 first 70 0.875000
5 3 second 10 0.125000