我想从数据帧中删除重复项,但由于我想同时满足三个条件,所以我陷入了困境。如何使用熊猫来实现?
请在下面找到我的代码:
import pandas as pd
list_name = ['Joe', 'Sarina', 'Paul', 'Ana', 'Joe', 'Sarina']
list_surname = ['Day', 'Summers', 'Smith', 'Baker', 'Day', 'Brown']
list_letter = ['a','b','c','d','a','b']
df_profiles = pd.DataFrame({'Name': list_name, 'Surname': list_surname, 'Letter':list_letter})
# Checking for duplicates using isin() and duplicated()
# Sorting them by index order
name = df_profiles['Name']
surname = df_profiles['Surname']
letter = df_profiles['Letter']
# If ALL these three are true delete duplicate
df_profiles[name.isin(name[name.duplicated()])].sort_values('Name').sort_index()
df_profiles[surname.isin(surname[surname.duplicated()])].sort_values('Surname').sort_index()
df_profiles[letter.isin(letter[letter.duplicated()])].sort_values('Letter').sort_index()
答案 0 :(得分:1)
我们有all
df=df_profiles[~df_profiles.apply(pd.Series.duplicated, keep=False).all(1)]
df
Out[84]:
Name Surname Letter
1 Sarina Summers b
2 Paul Smith c
3 Ana Baker d
5 Sarina Brown b
答案 1 :(得分:1)
您可以将条件与&
连接起来,并与~
取反:
df_profiles = df_profiles[~(name.isin(name[name.duplicated()]) &
surname.isin(surname[surname.duplicated()]) &
letter.isin(letter[letter.duplicated()])
)]
结果:
Name Surname Letter
1 Sarina Summers b
2 Paul Smith c
3 Ana Baker d
5 Sarina Brown b