我拉出了我的小组的短信,它看起来类似于下表(不包括第三列)。
如何根据时间将第一个对话与第二个对话分开,并将每条消息分配给一个session_id?如果有帮助,我很高兴地假设,如果未响应任何消息一个小时后,下一封邮件会开始新的对话。
如果在前两栏中给出第一列,那么我理想地能够在会话长度不同的近45,000条消息中找出第三列。
一旦我的短信分成对话,我想我可以训练一个聊天机器人来参与我们的聊天!我不知道我在这里做什么,所以不胜感激:)
| created_at | message | conversation_id |
| 2018-07-03 02:12:33 | knock knock | 1 |
| 2018-07-03 02:12:35 | who's there | 1 |
| 2018-07-03 02:12:40 | Europe | 1 |
| 2018-07-03 02:12:45 | Europe who? | 1 |
| 2018-07-03 02:12:48 | No - you're a poo | 1 |
| 2018-07-03 03:15:17 | knock knock | 2 |
| 2018-07-03 03:15:20 | who's there | 2 |
| 2018-07-03 03:15:23 | the KGB | 2 |
| 2018-07-03 03:15:28 | the KGB who? | 2 |
| 2018-07-03 03:15:33 | SLAP the KGB will ask the questions! | 2 |
答案 0 :(得分:1)
这应该可以解决问题。我认为您在遍历数据帧以分配ID方面不会有太大改进,因为它们基于与列中先前值的链接连接:
d = {"created_at": pd.to_datetime(["2018-07-03 02:12:33", "2018-07-03 02:12:35","2018-07-03 02:12:40","2018-07-03 02:12:45","2018-07-03 02:12:48","2018-07-03 03:15:17","2018-07-03 03:15:20","2018-07-03 03:15:23","2018-07-03 03:15:28","2018-07-03 03:15:33","2018-08-03 09:00:00","2018-09-03 10:15:00"]),
"message": ["knock knock","who's there","Europe","Europe who?","No - you're a poo","knock knock","who's there","the KGB","the KGB who?","SLAP the KGB will ask q's!","Hello?","Hello, again?"]}
import pandas as pd
import numpy as np
#60mins in secs
thresh = 60*60
df = pd.DataFrame(data=d)
#Creating time delta from previous message
df["delta"] = df["created_at"].diff().fillna(0).dt.total_seconds()
#Normalising delta based on threshold as a flag for new convos
df["id"] = np.where(df["delta"] < thresh, 0, 1)
df = df.drop(["delta"], axis=1)
#Assigning ID's to each convo
for i in range(1, len(df)):
df.loc[i, 'id'] += df.loc[i-1, 'id']
print(df)
created_at message id
0 2018-07-03 02:12:33 knock knock 0
1 2018-07-03 02:12:35 who's there 0
2 2018-07-03 02:12:40 Europe 0
3 2018-07-03 02:12:45 Europe who? 0
4 2018-07-03 02:12:48 No - you're a poo 0
5 2018-07-03 03:15:17 knock knock 1
6 2018-07-03 03:15:20 who's there 1
7 2018-07-03 03:15:23 the KGB 1
8 2018-07-03 03:15:28 the KGB who? 1
9 2018-07-03 03:15:33 SLAP the KGB will ask q's! 1
10 2018-08-03 09:00:00 Hello? 2
11 2018-09-03 10:15:00 Hello, again? 3