我有以下表示服务历史记录的数据框:
Start End ContactName Agente Code
0 2020-05-05 11:52:34 2020-05-05 18:03:15 473000 Carlos Pedro BRA 473000
1 2020-05-05 15:39:06 2020-05-05 18:03:09 580000 Rosineia Pedro BRA 580000
2 2020-05-05 17:47:59 2020-05-05 18:03:06 2038000 Mauricio Pedro BRA 2038000
3 2020-05-05 17:43:46 2020-05-05 18:02:58 3975000 - Sergio Pedro BRA 3975000
4 2020-05-05 15:34:44 2020-05-05 17:52:17 3388000 Rodrigo Pedro BRA 3388000
5 2020-05-05 15:34:43 2020-05-05 17:52:14 4077000 Pedro BRA 4077000
6 2020-05-05 17:45:24 2020-05-05 17:52:08 2064000 Cleberson Pedro BRA 2064000
7 2020-05-05 18:20:24 2020-05-05 18:25:00 2064000 Cleberson Pedro BRA 2064000
我想删除在一个小时内从同一客户到同一座席的呼叫,例如:
cleberson(最后一行)在17:45与代理开始约会 pedro并于17:52完成
此后不久(不到一个小时),他开始了另一项服务 不到一个小时就完成了
如果一个小时内发生多个记录,我只想保留一个记录。
感谢您的帮助,我尝试了所有方法,但未能完成
答案 0 :(得分:1)
我认为以下策略可以解决您的问题:
松散地基于您在psuedocode中的数据-
df = df.sort_values(by=['ContactName', 'Angente', 'Start'])
mask = (df['ContactName'] == df['ContactName'].shift(1)) & (df['Agente'] == df['Agente'].shift(1)) & (abs(df['Start'] - df['Start'].shift(1)) < pd.Timedelta('1 hour')) # select rows that have the same contact, agent as the previous row as well as started with less than 1 hour difference
df.loc[mask, 'to_remove'] = True
df = df[~df['to_remove'] # filter out redundant values