Question

我有一个相关矩阵，我融入了一个数据帧，所以现在我有以下例子：

First      Second       Value
A          B            0.5
B          A            0.5
A          C            0.2

我想只删除前两行中的一行。怎么办呢？

Answer 1

您可以在drop_duplicates ed列上致电np.sort：

df = df.loc[~pd.DataFrame(np.sort(df.iloc[:, :2])).duplicated()]
df

  First Second  Value
0     A      B    0.5
2     A      C    0.2

<强>详情

np.sort(df.iloc[:, :2])

array([['A', 'B'],
       ['A', 'B'],
       ['A', 'C']], dtype=object)

~pd.DataFrame(np.sort(df.iloc[:, :2], axis=1)).duplicated()

0     True
1    False
2     True
dtype: bool

对列进行排序并找出哪些是重复的。然后，掩码将用于通过布尔索引过滤掉数据帧。

要重置索引，请使用reset_index：

df.reset_index(drop=1)

  First Second  Value
0     A      B    0.5
1     A      C    0.2

Answer 2

使用：

#if want select columns by columns names
m = ~pd.DataFrame(np.sort(df[['First','Second']], axis=1)).duplicated()
#if want select columns by positons
#m = ~pd.DataFrame(np.sort(df.iloc[:,:2], axis=1)).duplicated()
print (m)

0     True
1    False
2     True
dtype: bool

df = df[m]
print (df)
  First Second  Value
0     A      B    0.5
2     A      C    0.2

Answer 3

还可以使用以下方法：

# create a new column after merging and sorting 'First' and 'Second':
df['newcol']=df.apply(lambda x: "".join(sorted(x[0]+x[1])), axis=1)
print(df)

  First Second  Value newcol
0     A      B    0.5     AB
1     B      A    0.5     AB
2     A      C    0.2     AC

# get its non-duplicated indexes and remove the new column: 
df = df[~df.newcol.duplicated()].iloc[:,:3]
print(df)

  First Second  Value
0     A      B    0.5
2     A      C    0.2

根据两列而不是订单的内容删除重复项

3 个答案: