Python Regex Engine - “look-behind需要固定宽度模式”错误

时间:2013-11-20 07:31:50

标签: python regex

我正在尝试以CSV格式处理字符串中不匹配的双引号。

准确地说,

"It "does "not "make "sense", Well, "Does "it"

应更正为

"It" "does" "not" "make" "sense", Well, "Does" "it"

所以基本上我要做的就是

  

替换所有'''

     
      
  1. 前面没有行首或逗号(和)
  2.   
  3. 后面没有逗号或行尾
  4.         

    与'“”'

为此我使用下面的正则表达式

(?<!^|,)"(?!,|$)

问题是Ruby正则表达式引擎(http://www.rubular.com/)能够解析正则表达式,python正则表达式引擎(https://pythex.org/http://www.pyregex.com/)抛出以下错误

Invalid regular expression: look-behind requires fixed-width pattern

使用python 2.7.3抛出

sre_constants.error: look-behind requires fixed-width pattern

谁能告诉我这里有什么烦恼?

=============================================== ===================================

编辑:

在Tim的回复之后,我获得了多行字符串

的以下输出
>>> str = """ "It "does "not "make "sense", Well, "Does "it"
... "It "does "not "make "sense", Well, "Does "it"
... "It "does "not "make "sense", Well, "Does "it"
... "It "does "not "make "sense", Well, "Does "it" """
>>> re.sub(r'\b\s*"(?!,|$)', '" "', str)
' "It" "does" "not" "make" "sense", Well, "Does" "it" "\n"It" "does" "not" "make" "sense", Well, "Does" "it" "\n"It" "does" "not" "make" "sense", Well, "Does" "it" "\n"It" "does" "not" "make" "sense", Well, "Does" "it" " '

在每一行的末尾,“它”旁边添加了两个双引号。

所以我对正则表达式进行了一个非常小的改动来处理新行。

re.sub(r'\b\s*"(?!,|$)', '" "', str,flags=re.MULTILINE)

但这会给出输出

>>> re.sub(r'\b\s*"(?!,|$)', '" "', str,flags=re.MULTILINE)
' "It" "does" "not" "make" "sense", Well, "Does" "it"\n... "It" "does" "not" "make" "sense", Well, "Does" "it"\n... "It" "does" "not" "make" "sense", Well, "Does" "it"\n... "It" "does" "not" "make" "sense", Well, "Does" "it" " '

最后'它'只有两个双引号。

但是我想知道为什么'$'行尾字符不会识别该行已经结束。

=============================================== ===================================

最终答案是

re.sub(r'\b\s*"(?!,|[ \t]*$)', '" "', str,flags=re.MULTILINE)

2 个答案:

答案 0 :(得分:25)

Python lookbehinds确实需要固定宽度,当你在一个不同长度的lookbehind模式中进行交替时,有几种方法可以处理这种情况:

  • 重写模式,这样您就不必使用替换(例如Tim使用单词边界进行上述回答,或者您也可以使用当前模式的精确等效(?<=[^,])"(?!,|$),该模式需要除逗号以外的字符在双引号之前,或用于匹配用空格括起的单词的常用模式(?<=\s|^)\w+(?=\s|$),可以写成(?<!\S)\w+(?!\S)),或
  • 拆分外观:
    • 需要在组中交替使用正面的后视镜(例如(?<=a|bc)应该重写为(?:(?<=a)|(?<=bc))
    • 负面的外观可以简单连接(例如(?<!^|,)"(?!,|$)应该看起来像(?<!^)(?<!,)"(?!,|$))。

答案 1 :(得分:16)

Python lookbehind断言需要固定宽度,但你可以试试这个:

>>> s = '"It "does "not "make "sense", Well, "Does "it"'
>>> re.sub(r'\b\s*"(?!,|$)', '" "', s)
'"It" "does" "not" "make" "sense", Well, "Does" "it"'

<强>说明:

\b      # Start the match at the end of a "word"
\s*     # Match optional whitespace
"       # Match a quote
(?!,|$) # unless it's followed by a comma or end of string