使用这个python pandas dataframe df:
CategoryA | CategoryB | Count
1 A 0
1 A -1
2 B 1
2 B 1
3 C 1
3 C -1
我基本上想要删除所有ClassA / B的分组,其总和低于0.
df['decision'] = np.where(df.groupby(['CategoryA', 'CategoryB'])['Count'].sum()>0, 'keep', 'delete')
但我收到此错误 ValueError:值的长度与索引的长度不匹配
输出将是:
CategoryA | CategoryB | Count | decision
1 A 0 delete
1 A -1 delete
2 B 1 keep
2 B 1 keep
3 C 1 delete
3 C -1 delete
更愿意使用df.loc执行此操作,但不确定如何使用。
答案 0 :(得分:3)
In [67]: df['decision'] = \
np.where(df.groupby(['CategoryA', 'CategoryB'])['Count'].transform('sum') > 0,
'keep', 'delete')
In [68]: df
Out[68]:
CategoryA CategoryB Count decision
0 1 A 0 delete
1 1 A -1 delete
2 2 B 1 keep
3 2 B 1 keep
4 3 C 1 delete
5 3 C -1 delete
答案 1 :(得分:3)
你走在正确的轨道上。
m = df.groupby(['CategoryA', 'CategoryB']).transform('sum').gt(0)
df['decision'] = np.where(m, 'keep', 'delete')
df
CategoryA CategoryB Count decision
0 1 A 0 delete
1 1 A -1 delete
2 2 B 1 keep
3 2 B 1 keep
4 3 C 1 delete
5 3 C -1 delete
使用transform
检索大小相同的结果。
答案 2 :(得分:3)
df['decision']=df['CategoryB'].map(df.groupby('CategoryB')['Count'].\
apply(lambda x :np.where(x.sum()>0,'keep','delete')))
df
Out[573]:
CategoryA CategoryB Count decision
0 1 A 0 delete
1 1 A -1 delete
2 2 B 1 keep
3 2 B 1 keep
4 3 C 1 delete
5 3 C -1 delete