Question

我需要创建仅具有唯一值的多个过滤数据框。

数据集1

Account     Verified     Paid   Col1 Col2 Col3
1234        True        True     ...  ...  ...
1237        False       True    
1234        True        True
4211        True        True
1237        False       True
312         False       False

数据集2

Account          Verified   Paid   Col1 Col2 Col3
41                True      True    ... ... ...
314               False     False
41                True      True
65                False     False

多个数据帧称为dtf[i]，其中i从1到2。预期输出为：

已过滤1

Account     Verified     Paid
1234        True        False
1237        False       True
4211        True        True

312         False       False

已过滤2

Account          Verified   Paid
41                True      True
314               False     False
65                False     False

如何提取这些唯一值？

Answer 1

如果您想删除重复项，请使用以下代码，pd.DataFrame.drop_duplicates

import pandas as pd
df = pd.DataFrame({'col_1':['A','B','A','B','C'], 'col_2':[3,4,3,5,6], 'col_3':[0,0.1,0.2,0.3,0.4]})
print(df)
df.drop_duplicates(['col_1','col_2'], inplace = True)
print(df)

如果要传递所有列以定义唯一性，请使用df.columns

df.drop_duplicates(df.columns, inplace = True)
print(df)

已编辑：

要遍历列表中的所有DataFrame，并且不想替换df，请使用以下代码，并说出inplace = False（默认）

lst_df = [pd.DataFrame({'col_1':['A','B','A','B','C'], 'col_2':[3,4,3,5,6], 'col_3':[0,0.1,0.2,0.3,0.4]}), pd.DataFrame({'col_1':['A','B','A','B','C'], 'col_2':[3,4,3,5,6], 'col_3':[0,0.1,0.2,0.3,0.4]})]
new_lst_df = []
[new_lst_df.append(lst_df[i].drop_duplicates(['col_1', 'col_2'])) for i in range(len(lst_df))]
print(new_lst_df)

Answer 2

如果您只想删除重复的帐号

 dtf[i].drop_duplicates(subset ="Account", 
                     keep = False, inplace = True)

或者如果您想删除确切的重复行：

 dtf[i].drop_duplicates(subset =[["Account","Verified","Paid"]], 
                     keep = False, inplace = True)

希望对您有帮助

从许多数据框中提取唯一的行

2 个答案: