Question

我想在pandas中执行groupby操作。例如，我想将字段B分组如下二：前面有2个的任何东西。三：前面有3个的东西。否则单独留下电池。

例如： DF

index   A    B    Count  Value
x       abc  1-a    1      1
x       abc  2-a    2      2
x       abc  2-b    1      4
x       xyz  3-b    2      0
x       xyz  3-a    3      2
y       abc  1-b    1      5
y       abc  1-c    0      3
y       ijk  3-a    2      1
y       ijk  2-c    1      2

结果将是：

index   A    B    Count  Value    (Count: sum by group, Value: average by group)
x       abc  1-a    1      1
x       abc  Two    2      3
x       xyz  Three  2      1
y       abc  1-b    1      5
y       abc  1-c    0      3
y       ijk  Three  2      1
y       ijk  Two    1      2

Answer 1

使用str.split + agg

df['B']=np.where(df['B'].str.split('-',expand=True)[0]!='1',df['B'].str.split('-',expand=True)[0],df['B'])
df.groupby(['index','A','B']).agg({'Count':'sum','Value':'mean'}).reset_index()
Out[1628]: 
  index    A    B  Count  Value
0     x  abc  1-a      1      1
1     x  abc    2      3      3
2     x  xyz    3      5      1
3     y  abc  1-b      1      5
4     y  abc  1-c      0      3
5     y  ijk    2      1      2
6     y  ijk    3      2      1

对于你的情况

df.groupby(['index','A','B']).agg(lambda x : x.mean() if x.name.startswith('Value') else x.sum()).reset_index()

Answer 2

还可以使用str.partition：

# make new 'B'
df.B.where(df.B.str.contains('1'), other = df.B.str.partition('-')[0], inplace = True)

# group and agg
df.groupby([
    'index',
    'A',
    'B'
]).agg({
    'Count' : 'sum',
    'Value' : 'mean'
}).reset_index()

Pandas groupby操作

2 个答案: