我的DataFrame包含Chat Transcript列。
ID Chat
1 P1: Please call me soon P2 - will call you
2 P2: Please call me soon P1 - will call you
我想创建一个带有yes / No或1/0的新列,如果聊天有模式“很快给我打电话”只在P1& P2但不在P2&之间P1。
输出就像
ID Chat Call me soon
1 P1: Please call me soon P2 - will call you Yes
2 P2: Please call me soon P1 - will call you No
我需要在python中完成它。请提出适当的方法。
答案 0 :(得分:2)
使用str.contains
+ np.where
:
df['Call me soon'] = np.where(
df.Chat.str.contains('(?<=P1).*?call me soon.*?(?=P2)'), 'Yes', 'No'
)
df
ID Chat Call me soon
0 1 P1: Please call me soon P2 - will call you Yes
1 2 P2: Please call me soon P1 - will call you No
正则表达式详细信息
(?<=P1) # lookbehind, match P1
.*? # any character - non-greedy
call me soon
.*?
(?=P2) # lookahead, match P2