我需要创建仅具有唯一值的多个过滤数据框。
数据集1
Account Verified Paid Col1 Col2 Col3
1234 True True ... ... ...
1237 False True
1234 True True
4211 True True
1237 False True
312 False False
数据集2
Account Verified Paid Col1 Col2 Col3
41 True True ... ... ...
314 False False
41 True True
65 False False
多个数据帧称为dtf[i]
,其中i
从1到2。
预期输出为:
已过滤1
Account Verified Paid
1234 True False
1237 False True
4211 True True
312 False False
已过滤2
Account Verified Paid
41 True True
314 False False
65 False False
如何提取这些唯一值?
答案 0 :(得分:0)
如果您想删除重复项,请使用以下代码,pd.DataFrame.drop_duplicates
import pandas as pd
df = pd.DataFrame({'col_1':['A','B','A','B','C'], 'col_2':[3,4,3,5,6], 'col_3':[0,0.1,0.2,0.3,0.4]})
print(df)
df.drop_duplicates(['col_1','col_2'], inplace = True)
print(df)
如果要传递所有列以定义唯一性,请使用df.columns
df.drop_duplicates(df.columns, inplace = True)
print(df)
已编辑:
要遍历列表中的所有DataFrame,并且不想替换df,请使用以下代码,并说出inplace = False
(默认)
lst_df = [pd.DataFrame({'col_1':['A','B','A','B','C'], 'col_2':[3,4,3,5,6], 'col_3':[0,0.1,0.2,0.3,0.4]}), pd.DataFrame({'col_1':['A','B','A','B','C'], 'col_2':[3,4,3,5,6], 'col_3':[0,0.1,0.2,0.3,0.4]})]
new_lst_df = []
[new_lst_df.append(lst_df[i].drop_duplicates(['col_1', 'col_2'])) for i in range(len(lst_df))]
print(new_lst_df)
答案 1 :(得分:0)
如果您只想删除重复的帐号
dtf[i].drop_duplicates(subset ="Account",
keep = False, inplace = True)
或者如果您想删除确切的重复行:
dtf[i].drop_duplicates(subset =[["Account","Verified","Paid"]],
keep = False, inplace = True)
希望对您有帮助