我有一个包含多个列的数据框,这些列必须全部是唯一的字符串,该特定行才有效(即,在下面的示例中,我有4列,因此必须具有4个唯一值)。因此,我想删除任何包含重复字符串的行。
感觉应该很简单,但是我无法弄清楚。任何帮助将不胜感激!
import pandas as pd
df = pd.DataFrame([['a','b','c','d'],['a','c','d','c'],['b','a','e','g'],['a','a','c','f'],['b','c','b','d']],columns=['Pos1','Pos2','Pos3','Pos4'])
print(df)
Pos1 Pos2 Pos3 Pos4
0 a b c d
1 a c d c
2 b a e g
3 a a c f
4 b c b d
The output I want will drop row index 1 ('c' is repeated), row index 3 ('a' is repeated) and row index 4 ('b' is repeated)
Pos1 Pos2 Pos3 Pos4
0 a b c d
2 b a e g
答案 0 :(得分:2)
按DataFrame.nunique
检查每行唯一值的数量,按Series.eq
(==
)比较列数,以boolean indexing
进行过滤:
df = df[df.nunique(axis=1).eq(len(df.columns))]
print (df)
Pos1 Pos2 Pos3 Pos4
0 a b c d
2 b a e g