我正在尝试从字符串中删除引用的序列。对于下面的示例,我的脚本工作正常:
import re
doc = ' Doc = "This is a quoted string: this is cool!" '
cleanr = re.compile('\".*?\"')
doc = re.sub(cleanr, '', doc)
print doc
结果(如预期):
' Doc = '
然而,当我在引用句子中转义字符串时,我无法使用我认为合适的模式删除转义序列:
import re
doc = ' Doc = "This is a quoted string: \"this is cool!\" " '
cleanr = re.compile('\\".*?\\"') # new pattern
doc = re.sub(cleanr, '', doc)
print doc
结果
'Doc = this is cool!'
预期:
'Doc = "This is a quoted string: " '
有谁知道发生了什么?如果模式'\\".*?\\"'
错了,那么它是正确的吗?
答案 0 :(得分:2)
doc
不包含任何转义字符,因此您的正则表达式不匹配。
将r
前缀添加到字符串中,这意味着它应被视为 raw 字符串,忽略转义代码。
试试这个:
>>> doc = r' Doc = "This is a quoted string: \"this is cool!\" " '
>>> cleanr = re.compile(r'\\".*?\\"')
>>> re.sub(cleanr, '', doc)
' Doc = "This is a quoted string: " '