pandas删除每个元素在每个组中的列上具有相同值的组

时间:2018-05-10 16:35:48

标签: python-3.x pandas dataframe pandas-groupby

我需要为groupby执行df,然后在每个组中,我想检查该组中的每个元素是否在列A上具有相同的值,如果是,请删除该组,

 df['cluster_id'] = df.groupby(['B', 'C', 'D'])['B'].transform('size')

 df = df.loc[
        df['cluster_id'] > 1 &
        df['cluster_id'] == df['cluster_id'] &
        df['A'] != df['A']]

但我收到了错误

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我想知道如何修复它。

1 个答案:

答案 0 :(得分:1)

我猜()遗失了:

df =df[(df['cluster_id'] > 1) & (df['cluster_id'] == df['cluster_id']) & (df['A'] != df['A'])]

似乎第二个条件似乎没有必要:

df = df[(df['cluster_id'] > 1) & (df['A'] != df['A'])]

也不需要新列,可以通过Series进行比较:

cluster_id = df.groupby(['B', 'C', 'D'])['B'].transform('size')

df = df[(cluster_id > 1) & (cluster_id == cluster_id) & (df['A'] != df['A'])]
df = df[(cluster_id > 1) & (df['A'] != df['A'])]