我正在尝试从所有特殊字符中清除字符串并保留其他所有内容,包括标点符号。
mystring = "Q18. On a scale from 0 to 10 where 0 means ‘not at all interested' and 10 means ‘very interested', how interested are you in helping to address problems that affect poor people in poor countries?"
到目前为止我的努力:
newlabel = re.sub('[^A-Za-z0-9]+', ' ', newstring)
输出:
Q18 On a scale from 0 to 10 where 0 means not at all interested and 10 means very interested how interested are you in helping to address problems that affect poor people in poor countries
如何在我目前拥有的正则表达式中保留标点符号,还是有更好的解决方案?
答案 0 :(得分:4)
解决,
print (newstring.decode('unicode_escape').encode('ascii','ignore'))
输出:
Q18. On a scale from 0 to 10 where 0 means not at all interested' and 10 means very interested', how interested are you in helping to address problems that affect poor people in poor countries?
答案 1 :(得分:1)
如果您需要更改的是保留点而不是将其添加到正则表达式中将解决此问题。
re.sub('[^A-Za-z0-9\.]+', ' ', mystring)
答案 2 :(得分:0)
只需在正则表达式中的每个标点符号前添加反斜杠.....