我有这个数据框:
index = [1, 2, 3, 4, 5, 6, 7, 8]
a = [1247, 1247, 1539, 1247, 1539, 1539, 1539, 1247]
b = ['Group_A', 'Group_A', 'Group_B', 'Group_C', 'Group_B', 'Group_B', 'Group_C', 'Group_B']
c = [np.nan, 23, 30, 27, np.nan, 42, 40, 62]
df = pd.DataFrame({'ID': a, 'Group': b, 'Unit_sold': c})
现在,我想计算A和B以及按ID分组的已售单元数。结果应如下所示:
ID Sum_AB Sum_C
0 1247 85.0 27.0
1 1539 72.0 40.0
答案 0 :(得分:3)
使用series.replace
用groupby()
和assign
替换“组”列和unstack
:
(df.assign(Group=df['Group'].replace(['A','B'],['AB','AB'],regex=True))
.groupby(['ID','Group'],sort=False)['Unit_sold'].sum().unstack()
.add_suffix('_sum').reset_index().rename_axis(None,axis=1))
ID Group_AB_sum Group_C_sum
0 1247 85.0 27.0
1 1539 72.0 40.0
答案 1 :(得分:2)
使用np.where
和pd.crosstab
df['Group'] = np.where(df['Group'].isin(['Group_A','Group_B']),'Sum_AB','Sum_C')
df2 = pd.crosstab(df.ID,df.Group,df.Unit_sold,aggfunc='sum').reset_index()
print(df2)
Group ID Sum_AB Sum_C
0 1247 85.0 27.0
1 1539 72.0 40.0