这是df示例
df = pd.DataFrame({'ID':[1, 2, 2, 2, 3, 3], 'Test':[0,0,1,2,3,2], 'Name':['ID stored','ID stored', 'ID not stored', 'ID not stored', 'ID not stored', 'ID stored']})
ID Name Test
0 1 ID stored 0
1 2 ID stored 0
2 2 ID not stored 1
3 2 ID not stored 2
4 3 ID not stored 3
5 3 ID stored 2
我希望实现的是根据列Name
删除重复值,只会导致此列中的值为ID Stored
的行。
这是最终结果:
ID Name Test
0 1 ID stored 0
1 2 ID stored 0
5 3 ID stored 2
答案 0 :(得分:1)
您并不是要求删除重复项,而是要过滤:
groupby
如果您想获取特定ID的最后一个ID,可以last
并致电df.loc[df['Name'] == 'ID stored'].groupby('ID', as_index=False).last()
:
GET
答案 1 :(得分:1)
boolean indexing
需要drop_duplicates
:
print (df.loc[df['Name'] == 'ID stored'].drop_duplicates('ID', keep='last'))
ID Name Test
0 1 ID stored 0
1 2 ID stored 0
5 3 ID stored 2
DataFrame
的最佳样本:
df = pd.DataFrame({'ID':[1, 2, 2, 2, 3, 3],
'Test':[0,0,1,2,3,4],
'Name':['ID stored','ID stored', 'ID not stored',
'ID stored', 'ID not stored', 'ID stored']})
print (df)
ID Name Test
0 1 ID stored 0
1 2 ID stored 0
2 2 ID not stored 1
3 2 ID stored 2
4 3 ID not stored 3
5 3 ID stored 4
print (df.loc[df['Name'] == 'ID stored'])
ID Name Test
0 1 ID stored 0
1 2 ID stored 0 <-duplicate ID 2
3 2 ID stored 2 <-duplicate ID 2
5 3 ID stored 4
print (df.loc[df['Name'] == 'ID stored'].drop_duplicates('ID', keep='last'))
ID Name Test
0 1 ID stored 0
3 2 ID stored 2
5 3 ID stored 4