根据不同列上重复条目中的列值选择行

时间:2017-02-22 11:26:05

标签: python pandas duplicates

这是df示例

df = pd.DataFrame({'ID':[1, 2, 2, 2, 3, 3], 'Test':[0,0,1,2,3,2], 'Name':['ID stored','ID stored', 'ID not stored', 'ID not stored', 'ID not stored', 'ID stored']})
   ID           Name  Test
0   1      ID stored     0
1   2      ID stored     0
2   2  ID not stored     1
3   2  ID not stored     2
4   3  ID not stored     3
5   3      ID stored     2

我希望实现的是根据列Name删除重复值,只会导致此列中的值为ID Stored的行。

这是最终结果:

    ID     Name          Test
0   1      ID stored     0
1   2      ID stored     0
5   3      ID stored     2

2 个答案:

答案 0 :(得分:1)

您并不是要求删除重复项,而是要过滤:

groupby

如果您想获取特定ID的最后一个ID,可以last并致电df.loc[df['Name'] == 'ID stored'].groupby('ID', as_index=False).last()

GET

答案 1 :(得分:1)

boolean indexing需要drop_duplicates

print (df.loc[df['Name'] == 'ID stored'].drop_duplicates('ID', keep='last'))
   ID       Name  Test
0   1  ID stored     0
1   2  ID stored     0
5   3  ID stored     2

DataFrame的最佳样本:

df = pd.DataFrame({'ID':[1, 2, 2, 2, 3, 3], 
                  'Test':[0,0,1,2,3,4],
                   'Name':['ID stored','ID stored', 'ID not stored', 
                           'ID stored', 'ID not stored', 'ID stored']})
print (df)
   ID           Name  Test
0   1      ID stored     0
1   2      ID stored     0
2   2  ID not stored     1
3   2      ID stored     2
4   3  ID not stored     3
5   3      ID stored     4

print (df.loc[df['Name'] == 'ID stored'])
   ID       Name  Test
0   1  ID stored     0
1   2  ID stored     0 <-duplicate ID 2
3   2  ID stored     2 <-duplicate ID 2
5   3  ID stored     4

print (df.loc[df['Name'] == 'ID stored'].drop_duplicates('ID', keep='last'))
   ID       Name  Test
0   1  ID stored     0
3   2  ID stored     2
5   3  ID stored     4