如何添加包含聚合在行上的信息的列?

时间:2018-03-23 12:03:10

标签: python pandas

我有以下数据框:

# Create a dataframe
   raw_data = {'trial_num': ['1', '1', '2', '2', '3', '3'], 
               'area': ['first', 'second', 'first', 'second','first','second'],
               'counts': [10, 25, 36, 2, 70, 10]}

   df = pd.DataFrame(raw_data, columns = ['trial_num', 'area', 'counts'])

  trial_num   area     count
0         1   first     10
1         1  second     25
2         2   first     36
3         2  second      2
4         3   first     70
5         3  second     10

我想添加一个新列“比例”,表示每个计数占每个“区域”总数的比例。像这样:

       trial_num  area     count  total_count proportion
    0         1   first     10       35    0.2857142857142857
    1         1  second     25       35    0.7142857142857143
    2         2   first     36       38    0.9473684210526315
    3         2  second      2       38    0.05263157894736842
    4         3   first     70       80    0.875
    5         3  second     10       80    0.125

我只有这么远:

df.counts.groupby(df.trial_num).sum()

trial_num
1    35
2    38
3    80

有没有一种有效的方法可以在不破坏数据框的情况下执行此操作?请帮忙。

1 个答案:

答案 0 :(得分:1)

您可以按div创建的SeriesGroupBy.transform,其尺寸与原始df相同:

df['proportion'] = df['counts'].div(df.groupby(['trial_num'])['counts'].transform('sum'))

替代map

s = df.groupby(['trial_num'])['counts'].sum()
df['proportion'] = df['counts'].div(df['trial_num'].map(s))
print (df)
  trial_num    area  counts  proportion
0         1   first      10    0.285714
1         1  second      25    0.714286
2         2   first      36    0.947368
3         2  second       2    0.052632
4         3   first      70    0.875000
5         3  second      10    0.125000