Question

我有一个包含多个列的数据框，这些列必须全部是唯一的字符串，该特定行才有效（即，在下面的示例中，我有4列，因此必须具有4个唯一值）。因此，我想删除任何包含重复字符串的行。

感觉应该很简单，但是我无法弄清楚。任何帮助将不胜感激！

import pandas as pd

df = pd.DataFrame([['a','b','c','d'],['a','c','d','c'],['b','a','e','g'],['a','a','c','f'],['b','c','b','d']],columns=['Pos1','Pos2','Pos3','Pos4'])


print(df)

  Pos1 Pos2 Pos3 Pos4
0    a    b    c    d
1    a    c    d    c
2    b    a    e    g
3    a    a    c    f
4    b    c    b    d


The output I want will drop row index 1 ('c' is repeated), row index 3 ('a' is repeated) and row index 4 ('b' is repeated)


  Pos1 Pos2 Pos3 Pos4
0    a    b    c    d
2    b    a    e    g

Answer 1

按DataFrame.nunique检查每行唯一值的数量，按Series.eq（==）比较列数，以boolean indexing进行过滤：

df = df[df.nunique(axis=1).eq(len(df.columns))]
print (df)
  Pos1 Pos2 Pos3 Pos4
0    a    b    c    d
2    b    a    e    g

如何从DataFrame中删除在多列中具有重复字符串的行？

1 个答案: