我有这样的df。 该值是投影值,因此存在许多列。
Customer seg value0 value1
A a 10 60
A b 20 50
A c 30 40
B a 40 30
B b 50 20
B c 60 10
我想通过引用seg
列来计算值。
a-b-c ( a minus b minus c)
每个客户
customer value0 value1
A -40 -30
B -70 0
如何通过对客户进行分组来计算每个值。
df.groupby(customer)
谢谢
答案 0 :(得分:4)
想法是多个值,可以用-1
减去,然后聚合sum
:
#filter only a,b,c rows
df1 = df[df['seg'].isin(['a','b','c'])]
a = np.where(df1['seg'].eq('a'), 1, -1)
df1.iloc[:, 2:] *= a[:, None]
print (df1)
Customer seg value0 value1
0 A a 10 60
1 A b -20 -50
2 A c -30 -40
3 B a 40 30
4 B b -50 -20
5 B c -60 -10
df2 = df1.groupby('Customer', as_index=False).sum()
print (df2)
Customer value0 value1
0 A -40 -30
1 B -70 0
或者如果希望多个数字列:
df1 = df[df['seg'].isin(['a','b','c'])]
c = df1.select_dtypes(np.number).columns
a = np.where(df1['seg'].eq('a'), 1, -1)
df1[c] *= a[:, None]
df2 = df1.groupby('Customer', as_index=False).sum()
print (df2)
Customer value0 value1
0 A -40 -30
1 B -70 0
答案 1 :(得分:2)
如何?
In [42]: df
Out[42]:
Customer seg value0 value1
0 A a 10 60
1 A b 20 50
2 A c 30 40
3 B a 40 30
4 B b 50 20
5 B c 60 10
In [43]: df.pivot('seg', 'Customer').T.eval('a - b - c').unstack(level=0)
Out[43]:
value0 value1
Customer
A -40 -30
B -70 0
如果您更喜欢groupby
,还有另一种解决方法:
In [44]: df.groupby('Customer').apply(lambda x:
x.set_index('seg')[['value0', 'value1']].T.eval('a - b - c'))
答案 2 :(得分:0)
另一种方法:使用numpy减法,并结合reduce:
(df.groupby('Customer')
.agg(value0=('value0',np.subtract.reduce),
value1=('value1',np.subtract.reduce))
)
value0 value1
Customer
A -40 -30
B -70 0