Question

给出如下数据框：

 col1 col2
a  0  True 
b  0  True
c  1  True
d  1  False
e  2  False
f  2  False
g  3  True

对于col1中的每个唯一值，我想检查col2中的所有值是否匹配，否则请删除该对应值的所有行，以便产生：

 col1 col2
a  0  True 
b  0  True
e  2  False
f  2  False
g  3  True

Answer 1

您要nunique：

df[df.groupby('col1')['col2'].transform('nunique').eq(1)]

输出：

   col1   col2
a     0   True
b     0   True
e     2  False
f     2  False
g     3   True

Answer 2

可能的解决方案之一：按col1分组并过滤每个组，以检查是否所有col2值是 True 或全部是 False ：

df.groupby('col1').filter(lambda x: x.col2.all() | (~x.col2).all())

Answer 3

您尝试了什么？似乎是一个相当简单的问题。我会用shape和drop_duplicates（）：

data=[
 col1 col2
a  0  True 
b  0  True
c  1  True
d  1  False
e  2  False
f  2  False
g  3  True
]

cols=["col1","col2"]
df=pd.DataFrame(data, columns=cols)

for ind, row in df.drop_duplicates(subset=["col1"])["col1"].iteritems():
    df1=df[df["col1"]==row]
    if df1.shape[0] == df1.drop_duplicates().shape[0]:
        #logic goes here

如果形状在删除重复项后匹配，则它们是不同的值。如果不是，则删除整个子集，不要使用这些行来构建新的数据框。

熊猫：遍历重复的行以检查唯一值

3 个答案: