如何删除两列中的重复项?

时间:2019-12-05 13:14:57

标签: python pandas

我有一个像这样的数据框:

df = pandas.DataFrame({"X1":["a","b","c"], "X2":["b","c","d"], "X3":[500,200,10]})

我只想保留第一行和第三行。

预期输出:

        X1  X2  X3
    0   a   b   500
    2   c   d   10

(X1,X2)=(b,c)对我来说是(a,b)的副本,因为X1 = X2。

(c,d)也是(b,c)的副本,但(b,c)将首先被删除。

一般我该怎么做?

1 个答案:

答案 0 :(得分:0)

如果我正确理解:a=b=c=d。 那么您的数据将是:

import pandas as pd

df = pd.DataFrame({"X1": ["a", "a", "a"], "X2": ["a", "a", "a"], "X3": [500, 200, 10]})

# Keep first duplicate
df_first = df.drop_duplicates(subset=['X1', 'X2'], keep='first')

# Keep last duplicate
df_last = df.drop_duplicates(subset=['X1', 'X2'], keep='last')

# Concatenate DataFrames
df_clean = pd.concat([df_first, df_last], axis=0)

...或者您可以一行完成

df_clean = pd.concat([df.drop_duplicates(subset=['X1', 'X2'], keep='first'), df.drop_duplicates(subset=['X1', 'X2'], keep='last')], axis=0)

这能回答您的问题吗?