Agg和groupby按特定条件

时间:2019-12-27 10:46:34

标签: python-3.x pandas pandas-groupby

我有这个数据框:

index = [1, 2, 3, 4, 5, 6, 7, 8]
a = [1247, 1247, 1539, 1247, 1539, 1539, 1539, 1247]
b = ['Group_A', 'Group_A', 'Group_B', 'Group_C', 'Group_B', 'Group_B', 'Group_C', 'Group_B']
c = [np.nan, 23, 30, 27, np.nan, 42, 40, 62]
df = pd.DataFrame({'ID': a, 'Group': b, 'Unit_sold': c})

现在,我想计算A和B以及按ID分组的已售单元数。结果应如下所示:

      ID    Sum_AB  Sum_C   
0   1247    85.0    27.0
1   1539    72.0    40.0

2 个答案:

答案 0 :(得分:3)

使用series.replacegroupby()assign替换“组”列和unstack

(df.assign(Group=df['Group'].replace(['A','B'],['AB','AB'],regex=True))
      .groupby(['ID','Group'],sort=False)['Unit_sold'].sum().unstack()
      .add_suffix('_sum').reset_index().rename_axis(None,axis=1))

     ID  Group_AB_sum  Group_C_sum
0  1247          85.0         27.0
1  1539          72.0         40.0

答案 1 :(得分:2)

使用np.wherepd.crosstab

df['Group'] = np.where(df['Group'].isin(['Group_A','Group_B']),'Sum_AB','Sum_C')
df2 = pd.crosstab(df.ID,df.Group,df.Unit_sold,aggfunc='sum').reset_index()
print(df2)
Group    ID  Sum_AB  Sum_C
0      1247    85.0   27.0
1      1539    72.0   40.0