Python /熊猫:如果满足三个条件,则删除重复项

时间:2020-05-21 21:37:35

标签: python pandas for-loop if-statement duplicates

我想从数据帧中删除重复项,但由于我想同时满足三个条件,所以我陷入了困境。如何使用熊猫来实现?

请在下面找到我的代码:

import pandas as pd

list_name = ['Joe', 'Sarina', 'Paul', 'Ana', 'Joe', 'Sarina']
list_surname = ['Day', 'Summers', 'Smith', 'Baker', 'Day', 'Brown']
list_letter = ['a','b','c','d','a','b']

df_profiles = pd.DataFrame({'Name': list_name, 'Surname': list_surname, 'Letter':list_letter})

# Checking for duplicates using isin() and duplicated()
# Sorting them by index order
name = df_profiles['Name']
surname = df_profiles['Surname']
letter = df_profiles['Letter']

# If ALL these three are true delete duplicate
df_profiles[name.isin(name[name.duplicated()])].sort_values('Name').sort_index()
df_profiles[surname.isin(surname[surname.duplicated()])].sort_values('Surname').sort_index()
df_profiles[letter.isin(letter[letter.duplicated()])].sort_values('Letter').sort_index()

2 个答案:

答案 0 :(得分:1)

我们有all

df=df_profiles[~df_profiles.apply(pd.Series.duplicated, keep=False).all(1)]

df
Out[84]: 
     Name  Surname Letter
1  Sarina  Summers      b
2    Paul    Smith      c
3     Ana    Baker      d
5  Sarina    Brown      b

答案 1 :(得分:1)

您可以将条件与&连接起来,并与~取反:

df_profiles = df_profiles[~(name.isin(name[name.duplicated()]) &  
                            surname.isin(surname[surname.duplicated()]) &  
                            letter.isin(letter[letter.duplicated()]) 
                           )] 

结果:

     Name  Surname Letter
1  Sarina  Summers      b
2    Paul    Smith      c
3     Ana    Baker      d
5  Sarina    Brown      b