获取用特殊字符«和»括起来的字符串之外的所有双引号字符

时间:2019-07-24 05:38:52

标签: regex python-3.x

我想从封闭字符«和»之外的所有子字符串中获取所有双引号,然后将其替换为转义字符,然后是双引号,即。 \“例如

输入字符串:

'The first generally recognized "wiki" application,«"WikiWikiWeb"», was created by American computer programmer "Ward Cunningham" in 1994'

预期输出:

'The first generally recognized \"wiki\" application,«"WikiWikiWeb"», was created by American computer programmer \"Ward Cunningham\" in 1994'

我尝试了以下代码。

string = '''The first generally recognized "wiki" application,«"WikiWikiWeb"», was created by American computer programmer "Ward Cunningham" in 1994'''

import re
arr = re.findall(r'(.*?)\«.*?\»', string)
for tag in arr :
 new_tag = tag.replace('"','\\"')
 string = string.replace(tag, new_tag)

Output: The first generally recognized \"wiki\" application,«"WikiWikiWeb"», was created by American computer programmer "Ward Cunningham" in 1994

此代码的问题是正则表达式没有给我所有的子字符串,在这种情况下是第二个子字符串,即。预期结果必须是:

[“第一个公认的“维基”应用程序”,是由美国计算机程序员“沃德·坎宁安”于1994年创建的。”

我想要正则表达式,它应该给我所有来自子字符串的引号,而不是给括起来的特殊字符之外的子字符串本身。

2 个答案:

答案 0 :(得分:2)

string = '''The first generally recognized "wiki" application,«blah"WikiWikiWeb"blah», was created by American computer programmer "Ward Cunningham" in 1994'''

import re
arr = re.findall(r'«.*?»|(".+?")', string)
for tag in arr :
  new_tag = tag.replace('"','\\"')
  string = string.replace(tag, new_tag)

print string

输出:

The first generally recognized \"wiki\" application,«blah"WikiWikiWeb"blah», was created by American computer programmer \"Ward Cunningham\" in 1994

答案 1 :(得分:1)

您可以将这种模式用于正则表达式:

string = re.sub(r'(?<!\«)"(?!\»)','\\"',string)

(?<!«)是负向后看,表示找到“,后面没有« 和 (?!»)是负向前行,效果相同,但向后工作