Title URL Price Address Rental_Type
0 House URL $600 Auburn Apartment
1 House URL $600 Auburn Apartment
2 House URL $900 NY Apartment
3 Room! URL $1018 NaN Office
4 Room! URL $910 NaN Office
我正在尝试删除Title
下的重复项。但我只想删除Rental_Type == 'Office'
行。我还有第二个约束。我想用Rental_Type == 'Apartment'
删除行,但我希望在这种情况下保留第一个副本。因此,在这种情况下,第3行和第4行将丢弃,然后仅在第0/1行中排第1行。
答案 0 :(得分:1)
我会逐步构建这个,以构建您希望删除的事件列表。
offices = df['Rental_Type'] == 'Office'
apts = df['Rental_Type'] == 'Apartment'
dup_offices = df[offices].duplicated('Title', keep=False)
dup_apts = df[apts].duplicated('Title', keep='first')
to_drop = pd.Index(dup_apts[dup_apts].index.tolist() + \
dup_offices[dup_offices].index.tolist())
df = df.drop(to_drop)
答案 1 :(得分:0)
您可以以这种方式删除带有约束的重复项:
#drop all duplicate with Rental_Type=='Office'
df1 = df[(df.Rental_Type=='Office')].drop_duplicates(['Title'], keep=False)
#Capture the duplicate row with Rental_Type=='Apartment'
df2 = df[(df.Rental_Type=='Apartment')].duplicated(['Title'], keep = 'last')
df3=df[(df.Rental_Type=='Apartment')][df2.values][1:]
#Put them together
df_final = pd.concat([df1,df3])
In [1]: df_final
Out[1]:
Title URL Price Address Rental_Type
1 House URL 600 Auburn Apartment