我有一个 CSV 文件,该文件在该行中有多个重复值。我想删除这些重复的值,所以我只剩下唯一的值。
数据框:
1 2 3 4 5 6
Bypass User Account Control T3431 Elevated Execution T3424 Bypass User Account Control T3431
Local Account T3523 Domain Account T4252 Local Account T3523
预期数据帧:
1 2 3 4 5 6
Bypass User Account Control T3431 Elevated Execution T3424
Local Account T3523 Domain Account T4252
行中有 100 条重复数据,我只想查看唯一值
答案 0 :(得分:1)
用unique
将每一行转化为唯一值,输出为array
,所以转化为Series
:
df1 = df.apply(lambda x: pd.Series(x.unique()), axis=1)
print (df1)
0 1 2 3
0 Bypass User Account Control T3431 Elevated Execution T3424
1 Local Account T3523 Domain Account T4252
或者:
df1 = df.apply(lambda x: x.drop_duplicates().reset_index(drop=True), axis=1)
print (df1)
0 1 2 3
0 Bypass User Account Control T3431 Elevated Execution T3424
1 Local Account T3523 Domain Account T4252
最后用于原始列名称:
df1.columns = df.columns[:len(df1.columns)]
答案 1 :(得分:1)
使用
(df.stack()
.groupby(level=0).apply(lambda x: x.drop_duplicates())
.unstack()
.reset_index(drop=True))
结果:
1 2 3 4
0 Bypass User Account Control T3431 Elevated Execution T3424
1 Local Account T3523 Domain Account T4252