在熊猫数据框中匹配文本

时间:2020-07-16 08:18:24

标签: python python-3.x pandas

我的数据框看起来像-

id                  text
1           i am interested.
2           don't call me.I am bzy.
3           pls help me regarding this product.
4           donot call me.
5           I have some req.please mail me.

我的最终数据框看起来像-

id                  text                                 results
1           i am interested.                                yes
2           don't call me.I am bzy.                         no
3           pls help me regarding this product.             yes
4           donot call me.                                  no      
5           I have some req.please mail me.                 yes

我写了以下代码-

d1 = {'no': ['not interested','don't', 'donot']}

# create regex 
reg = '|'.join([f'\\b{x}\\b' for x in list(d1.values())[0]])

# apply function
df['results'] = df['text'].str.lower().str.contains(reg).map({True: 'no', False: 'yes'})

获取错误-

File "<ipython-input-59-62192b95d669>", line 1
    d1 = {'irrelevant': ['not interested','don't','donot']}
                                               ^
SyntaxError: invalid syntax

1 个答案:

答案 0 :(得分:0)

发生这种情况的原因是因为对象不是字符串。

您应该使用转义字符来打印序列。

d1 = {'no': ['not interested','don\'t', 'donot']}

相关问题