如何仅通过保留优先事件来删除重复项,但仅适用于熊猫的一个类别。
event_name
列上有两个类别,process_now
和fast_order
,但是删除重复项有一些特殊之处:
1.仅删除fast_order
类别上的重复项
2.如果fast_order
连续出现多个,则每个连续中只保留一个(不是每个用户ID)
3.删除重复项是保持第一项出现
数据
User_id event_name timestamp
1 process_now 08:00:01
1 process_now 08:00:02
1 process_now 08:00:03
1 fast_order 08:00:04
1 fast_order 08:00:05
1 process_now 08:00:06
2 process_now 08:00:01
2 process_now 08:00:02
2 fast_order 08:00:03
2 fast_order 08:00:04
2 fast_order 08:00:05
2 process_now 08:00:06
2 fast_order 08:00:07
2 fast_order 08:00:08
2 process_now 08:00:09
我需要展示的是
User_id Event_name timestamp
1 process_now 08:00:01
1 process_now 08:00:02
1 process_now 08:00:03
1 fast_order 08:00:04
1 process_now 08:00:06
2 process_now 08:00:01
2 process_now 08:00:02
2 fast_order 08:00:03
2 process_now 08:00:06
2 fast_order 08:00:07
2 process_now 08:00:09
我应该怎么做?
答案 0 :(得分:2)
每两列使用DataFrame.duplicated
,以获取连续的组,逆条件,并按|
进行按位OR
的cchain检验条件,如果不等于fast_order
:
g = df['event_name'].ne(df['event_name'].shift()).cumsum()
df = df[df['event_name'].ne('fast_order') | ~df.assign(g=g).duplicated(['User_id','g'])]
print (df)
User_id event_name timestamp
0 1 process_now 08:00:01
1 1 process_now 08:00:02
2 1 process_now 08:00:03
3 1 fast_order 08:00:04
5 1 process_now 08:00:06
6 2 process_now 08:00:01
7 2 process_now 08:00:02
8 2 fast_order 08:00:03
11 2 process_now 08:00:06
12 2 fast_order 08:00:07
14 2 process_now 08:00:09
详细信息:
print (df.assign(g=g))
User_id event_name timestamp g
0 1 process_now 08:00:01 1
1 1 process_now 08:00:02 1
2 1 process_now 08:00:03 1
3 1 fast_order 08:00:04 2
5 1 process_now 08:00:06 3
6 2 process_now 08:00:01 3
7 2 process_now 08:00:02 3
8 2 fast_order 08:00:03 4
11 2 process_now 08:00:06 5
12 2 fast_order 08:00:07 6
14 2 process_now 08:00:09 7
print (df.assign(g=g).duplicated(['User_id','g']))
0 False
1 True
2 True
3 False
5 False
6 False
7 True
8 False
11 False
12 False
14 False
dtype: bool
print (~df.assign(g=g).duplicated(['User_id','g']))
0 True
1 False
2 False
3 True
5 True
6 True
7 False
8 True
11 True
12 True
14 True
dtype: bool