Question

使用这个示例数据，我想为所有行创建一个子集，其中A，B和C列中的字符串值不完全相同。

    A    B       C      names
0   cat  cat     cat    mark
1   dog  dog     dog    kate
2   dog  dog     rat    james
3   rat  cat     dog    joe

子集看起来像这样：

2   dog  dog     rat    james
3   rat  cat     dog    joe

返回第2行和第3行，因为它们在A，B和C列中都有一个或多个值不同。

Answer 1

假设上面的数据帧是df，您可以通过检查B和C中的值是否都等于A来选择这些行：

In [56]: mask = df[['A', 'B', 'C']].eq(df['A'], axis=0).all(axis=1)

In [57]: mask
Out[57]:
0     True
1     True
2    False
3    False
dtype: bool

In [59]: df[~mask]
Out[59]:
     A    B    C  names
2  dog  dog  rat  james
3  rat  cat  dog    joe

我们必须使用eq而非df[['B', 'C']] == df['A']的原因是因为最后一条语句会尝试将df['A']的索引与数据框df[['B','C']]的列匹配

Pandas：多行值不完全相同的子集数据

1 个答案: