我需要为groupby
执行df
,然后在每个组中,我想检查该组中的每个元素是否在列A
上具有相同的值,如果是,请删除该组,
df['cluster_id'] = df.groupby(['B', 'C', 'D'])['B'].transform('size')
df = df.loc[
df['cluster_id'] > 1 &
df['cluster_id'] == df['cluster_id'] &
df['A'] != df['A']]
但我收到了错误
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
我想知道如何修复它。
答案 0 :(得分:1)
我猜()
遗失了:
df =df[(df['cluster_id'] > 1) & (df['cluster_id'] == df['cluster_id']) & (df['A'] != df['A'])]
似乎第二个条件似乎没有必要:
df = df[(df['cluster_id'] > 1) & (df['A'] != df['A'])]
也不需要新列,可以通过Series
进行比较:
cluster_id = df.groupby(['B', 'C', 'D'])['B'].transform('size')
df = df[(cluster_id > 1) & (cluster_id == cluster_id) & (df['A'] != df['A'])]
df = df[(cluster_id > 1) & (df['A'] != df['A'])]