下午,
我在我的df中有重复项(id和自动收录机组合),我想显示然后根据另一列中的条件删除。我已经看过许多排序解决方案,但更愿意通过过滤来解决这个问题。
display(df)
id ticker state
396219 ACGB 3 1/4 04/21/29 Ended
396496 NULL Done
396496 ACGB 5 3/4 05/15/21 Done
396521 ACGB 4 1/2 04/15/20 Ended
396523 ACGB 1 3/4 11/21/20 Ended
396581 TCV 5 1/2 11/15/18 Order Sent
396588 TCV 5 1/2 11/15/18 Order Sent
396588 TCV 5 1/2 11/15/18 Done
396680 KBN 3.4 07/24/28 Done
id ticker state 396588 TCV 5 1/2 11/15/18 Order Sent 396588 TCV 5 1/2 11/15/18 Done
我试过df [df.duplicated(['id','ticker'])])但它显示了所有行
id ticker state 396588 TCV 5 1/2 11/15/18 Done
396219 ACGB 3 1/4 04/21/29 Ended 396496 NULL Done 396496 ACGB 5 3/4 05/15/21 Done 396521 ACGB 4 1/2 04/15/20 Ended 396523 ACGB 1 3/4 11/21/20 Ended 396581 TCV 5 1/2 11/15/18 Order Sent 396588 TCV 5 1/2 11/15/18 Done 396680 KBN 3.4 07/24/28 Done
答案 0 :(得分:0)
执行此操作的一种简单方法是在顶部使用[
{
"purchase_id": 1,
"name": "A",
"id": 1,
"price": 5,
"qty": 2
},
{
"purchase_id": 1,
"name": "A",
"id": 1,
"price": 10,
"qty": 2
},
{
"purchase_id": 1,
"name": "B",
"id": 2,
"price": 3,
"qty": 4
},
{
"purchase_id": 2,
"name": "C",
"id": 3,
"price": 5,
"qty": 2
},
{
"purchase_id": 2,
"name": "D",
"id": 4,
"price": 3,
"qty": 4
},
]
等于“完成”行来订购数据框。然后按status
和id
删除重复项。
这会重新排序您的数据,但如果需要,您可以在ticker
之后通过id
重新排序。
这是一种方式:
sort_values
<强>结果强>
# bring Done rows to top
res = pd.concat([df[df['state'] == 'Done'], df[df['state'] != 'Done']])
# drop duplicates and sort by id
res = res.drop_duplicates(subset=['id', 'ticker'])\
.sort_values('id')\
.reset_index(drop=True)