我正试图为已回答的问题做一个标记。下面是一个示例数据框。
userid message type
1 hi incoming
1 how may I help you outgoing
1 looking for a job incoming
1 whats your name outgoing
1 nitin incoming
1 kansal incoming
1 whats your age outgoing
2 hi incoming
2 how may I help you outgoing
3 hi incoming
3 how may I help you outgoing
3 looking for a restaurant incoming
3 can you suggest something incoming
3 whats your name outgoing
因此,现在,通过相同用户ID收到传入问题的传出问题 将有一个标志。输出数据框看起来像。
userid message type got_response
1 hi incoming
1 how may I help you outgoing 1
1 looking for a job incoming
1 whats your name outgoing 1
1 nitin incoming
1 kansal incoming
1 whats your age outgoing 0
2 hi incoming
2 how may I help you outgoing 0
3 hi incoming
3 how may I help you outgoing 1
3 looking for a restaurant incoming
3 can you suggest something incoming
3 whats your name outgoing 0
寻找基于numpy的解决方案。我已经使用for循环完成了此操作,但实际的数据库具有数百万行,因此需要数小时才能完成任务。
答案 0 :(得分:1)
df['Flag'] = ((df['userid'] == df['userid'].shift(-1)) & (df['type'].eq('outgoing') & df['type'].shift(-1).eq('incoming')))