熊猫从行中删除重复项

时间:2021-02-03 10:50:15

标签: python pandas dataframe duplicates

我有一个 CSV 文件,该文件在该行中有多个重复值。我想删除这些重复的值,所以我只剩下唯一的值。

数据框:

 1                            2          3                   4           5                              6    
Bypass User Account Control  T3431      Elevated Execution   T3424      Bypass User Account Control    T3431
Local Account                T3523      Domain Account       T4252      Local Account                  T3523

预期数据帧:

  1                            2          3                   4           5                              6    
Bypass User Account Control  T3431      Elevated Execution   T3424      
Local Account                T3523      Domain Account       T4252                         

行中有 100 条重复数据,我只想查看唯一值

2 个答案:

答案 0 :(得分:1)

unique将每一行转化为唯一值,输出为array,所以转化为Series

df1 = df.apply(lambda x: pd.Series(x.unique()), axis=1)
print (df1)
                             0      1                   2      3
0  Bypass User Account Control  T3431  Elevated Execution  T3424
1                Local Account  T3523      Domain Account  T4252

或者:

df1 = df.apply(lambda x: x.drop_duplicates().reset_index(drop=True), axis=1)
print (df1)
                             0      1                   2      3
0  Bypass User Account Control  T3431  Elevated Execution  T3424
1                Local Account  T3523      Domain Account  T4252

最后用于原始列名称:

df1.columns = df.columns[:len(df1.columns)]

答案 1 :(得分:1)

使用

(df.stack()
  .groupby(level=0).apply(lambda x: x.drop_duplicates())
  .unstack()
  .reset_index(drop=True))

结果:

                             1      2                   3      4
0  Bypass User Account Control  T3431  Elevated Execution  T3424
1                Local Account  T3523      Domain Account  T4252