如何在大熊猫中分组和聚集

时间:2020-02-20 12:47:31

标签: python pandas

我有这样的df。 该值是投影值,因此存在许多列。

Customer seg value0  value1   
A         a   10      60
A         b   20      50
A         c   30      40
B         a   40      30
B         b   50      20
B         c   60      10

我想通过引用seg列来计算值。

a-b-c ( a minus b minus c)

每个客户

customer value0 value1 
A        -40     -30   
B        -70      0

如何通过对客户进行分组来计算每个值。

df.groupby(customer)

谢谢

3 个答案:

答案 0 :(得分:4)

想法是多个值,可以用-1减去,然后聚合sum

#filter only a,b,c rows
df1 = df[df['seg'].isin(['a','b','c'])]

a = np.where(df1['seg'].eq('a'), 1, -1)
df1.iloc[:, 2:] *= a[:, None]

print (df1)
  Customer seg  value0  value1
0        A   a      10      60
1        A   b     -20     -50
2        A   c     -30     -40
3        B   a      40      30
4        B   b     -50     -20
5        B   c     -60     -10

df2 = df1.groupby('Customer', as_index=False).sum()
print (df2)
  Customer  value0  value1
0        A     -40     -30
1        B     -70       0

或者如果希望多个数字列:

df1 = df[df['seg'].isin(['a','b','c'])]
c = df1.select_dtypes(np.number).columns

a = np.where(df1['seg'].eq('a'), 1, -1)
df1[c] *= a[:, None]

df2 = df1.groupby('Customer', as_index=False).sum()
print (df2)
  Customer  value0  value1
0        A     -40     -30
1        B     -70       0

答案 1 :(得分:2)

如何?

In [42]: df
Out[42]:
  Customer seg  value0  value1
0        A   a      10      60
1        A   b      20      50
2        A   c      30      40
3        B   a      40      30
4        B   b      50      20
5        B   c      60      10

In [43]: df.pivot('seg', 'Customer').T.eval('a - b - c').unstack(level=0)
Out[43]:
          value0  value1
Customer
A            -40     -30
B            -70       0

如果您更喜欢groupby,还有另一种解决方法:

In [44]: df.groupby('Customer').apply(lambda x: 
            x.set_index('seg')[['value0', 'value1']].T.eval('a - b - c'))

答案 2 :(得分:0)

另一种方法:使用numpy减法,并结合reduce:

(df.groupby('Customer')
   .agg(value0=('value0',np.subtract.reduce),
        value1=('value1',np.subtract.reduce))
 )


          value0    value1
Customer        
A          -40  -30
B          -70  0

numpy reduce

numpy subtract