我有以下数据框df
:
Location ID Item Qty Time
(...)
42666 381 202546661 995820 1 06:55:07
42667 761 202547268 995820 1 07:12:44
42668 494 202546857 995822 1 06:58:30
42669 455 202546771 999810 1 06:56:52 <- head
42670 730 202547225 999810 1 07:11:57 <- to be deleted
42671 761 202547268 999810 1 07:13:04 <- tail
42672 494 202546857 999825 2 06:58:52
42673 424 202546723 999942 1 06:55:36 <- head
42674 487 202546848 999942 1 06:57:47 <- to be deleted
42675 514 202546891 999942 1 06:59:23 <- to be deleted
42676 587 202547004 999942 1 07:01:03 <- to be deleted
42677 654 202547101 999942 1 07:01:42 <- tail
(...)
我正试图只获得头部和尾部,并删除它们之间的行,所以看起来像这样:
Location ID Item Qty Time
(...)
42666 381 202546661 995820 1 06:55:07
42667 761 202547268 995820 1 07:12:44
42668 494 202546857 995822 1 06:58:30
42669 455 202546771 999810 1 06:56:52 <- head
42670 761 202547268 999810 1 07:13:04 <- tail
42671 494 202546857 999825 2 06:58:52
42672 424 202546723 999942 1 06:55:36 <- head
42673 654 202547101 999942 1 07:01:42 <- tail
(...)
我如何实现这一结果?
提前谢谢!
答案 0 :(得分:2)
您可以使用groupby.nth
将每个组中的第一个和最后一个样本保留下来:
df = df.groupby('Item').nth([0,-1]).reset_index()
Item Location ID Qty Time
0 995820 381 202546661 1 06:55:07
1 995820 761 202547268 1 07:12:44
2 995822 494 202546857 1 06:58:30
3 999810 455 202546771 1 06:56:52
4 999810 761 202547268 1 07:13:04
5 999825 494 202546857 2 06:58:52
6 999942 424 202546723 1 06:55:36
7 999942 654 202547101 1 07:01:42