根本原因

Question

我有这段代码：

for n in (range(1,10)):
    new = re.sub(r'(regex(group)regex)?regex', r'something'+str(n)+r'\1', old, count=1)

它抛出了无法匹配的组错误。但如果它不匹配，我想在那里添加空字符串而不是抛出错误。我怎么能做到这一点？

注意：我的完整代码比此示例复杂得多。但是如果你找到更好的解决方案如何迭代匹配并在里面添加数字，你可以分享。我的完整代码：

for n in (range(1,(text.count('soutez')+1))):
    text = re.sub(r'(?i)(\s*\{{2}infobox medaile reprezentant(ka)?\s*\|\s*([^\}]*)\s*\}{2}\s*)?\{{2}infobox medaile soutez\s*\|\s*([^\}]*)\s*\}{2}\s*', r"\n | reprezentace"+str(n)+r" = \3\n | soutez"+str(n)+r" = \4\n | medaile"+str(n)+r" = \n", text, count=1)

Answer 1

根本原因

在Python 3.5之前，对Python re.sub中失败的捕获组的反向引用没有填充空字符串。这是Bug 1519638 description at bugs.python.org。因此，当对未参与匹配的组使用反向引用时会导致错误。

有两种方法可以解决这个问题。

解决方案1：添加空替代方案以使必需组成为必需

您可以将所有可选的捕获组（像(\d+)?这样的结构）替换为具有空替代的强制替换（即(\d+|)）。

以下是an example of the failure：

import re
old = 'regexregex'
new = re.sub(r'regex(group)?regex', r'something\1something', old)
print(new)

Replacing one line与

new = re.sub(r'regex(group|)regex', r'something\1something', old)

有效。

解决方案2：在替换中使用lambda表达式并检查该组是否为`None`

如果您在另一个可选组中包含可选组，则此方法是必需的。

您可以在替换部件中使用lambda来检查是否已使用None初始化组，而不是lambda m: m.group(n) or ''。 在您的情况下使用此解决方案，因为您在替换模式中有两个反向引用 - ＃3和＃4 - 但some matches（请参阅匹配1和3）没有Capture组3初始化。之所以会发生这种情况，是因为整个第一部分 - (\s*\{{2}funcA(ka|)\s*\|\s*([^}]*)\s*\}{2}\s*|) - 没有参与比赛，而内部捕获组3（即([^}]*)）只是即使在添加空替换后也不会填充< / em>的

re.sub(r'(?i)(\s*\{{2}funcA(ka|)\s*\|\s*([^\}]*)\s*\}{2}\s*|)\{{2}funcB\s*\|\s*([^\}]*)\s*\}{2}\s*', r"\n | funcA"+str(n)+r" = \3\n | funcB"+str(n)+r" = \4\n | string"+str(n)+r" = \n", text, count=1)

应该用
重写
re.sub(r'(?i)(\s*{{funcA(ka|)\s*\|\s*([^}]*)\s*}}\s*|){{funcB\s*\|\s*([^}]*)\s*}}\s*', lambda m: r"\n | funcA"+str(n)+r" = " + (m.group(3) or '') + "\n | funcB" + str(n) + r" = " + (m.group(4) or '') + "\n | string" + str(n) + r" = \n", text, count=1)

请参阅IDEONE demo

import re text = r''' {{funcB|param1}} *some string* {{funcA|param2}} {{funcB|param3}} *some string2* {{funcB|param4}} *some string3* {{funcAka|param5}} {{funcB|param6}} *some string4* ''' for n in (range(1,(text.count('funcB')+1))): text = re.sub(r'(?i)(\s*\{{2}funcA(ka|)\s*\|\s*([^\}]*)\s*\}{2}\s*|)\{{2}funcB\s*\|\s*([^\}]*)\s*\}{2}\s*', lambda m: r"\n | funcA"+str(n)+r" = "+(m.group(3) or '')+"\n | funcB"+str(n)+r" = "+(m.group(4) or '')+"\n | string"+str(n)+r" = \n", text, count=1) assert text == r''' | funcA1 = | funcB1 = param1 | string1 = *some string* | funcA2 = param2 | funcB2 = param3 | string2 = *some string2* | funcA3 = | funcB3 = param4 | string3 = *some string3* | funcA4 = param5 | funcB4 = param6 | string4 = *some string4* ''' print 'ok'

Answer 2

我再次看了这个请注意，很遗憾你必须处理 NULL '，
但这是你必须遵守的规则。

以下匹配所有成功匹配任何内容你必须这样做才能找出规则。

这并不像你想象的那么简单。仔细看看结果 formwise 没有明显的明确方式来判断你是否会获得NULL或EMPTY。

然而，仔细观察，规则出来并且非常简单如果您关心NULL，则必须遵循这些规则。

只有两个规则：

规则＃1 - 任何无法访问的代码GROUP将导致NULL

   (?<Alt_1>                     # (1 start)
        (?<a> a )?                    # (2)
        (?<b> b? )                    # (3)
   )?                            # (1 end)
|  
   (?<Alt_2>                     # (4 start)
        (?<c> c? )                    # (5)
        (?<d> d? )                    # (6)
   )                             # (4 end)

 **  Grp 0         -  ( pos 0 , len 0 )  EMPTY 
 **  Grp 1 [Alt_1] -  ( pos 0 , len 0 )  EMPTY 
 **  Grp 2 [a]     -  NULL 
 **  Grp 3 [b]     -  ( pos 0 , len 0 )  EMPTY 
 **  Grp 4 [Alt_2] -  NULL 
 **  Grp 5 [c]     -  NULL

规则＃2 - 无法在 INSIDE 上匹配的任何代码GROUP将导致NULL

 (?<A_1>                       # (1 start)
      (?<a1> a? )                   # (2)
 )?                            # (1 end)
 (?<A_2>                       # (3 start)
      (?<a2> a )?                   # (4)
 )?                            # (3 end)
 (?<A_3>                       # (5 start)
      (?<a3> a )                    # (6)
 )?                            # (5 end)
 (?<A_4>                       # (7 start)
      (?<a4> a )?                   # (8)
 )                             # (7 end)

**  Grp 0       -  ( pos 0 , len 0 )  EMPTY 
**  Grp 1 [A_1] -  ( pos 0 , len 0 )  EMPTY 
**  Grp 2 [a1]  -  ( pos 0 , len 0 )  EMPTY 
**  Grp 3 [A_2] -  ( pos 0 , len 0 )  EMPTY 
**  Grp 4 [a2]  -  NULL 
**  Grp 5 [A_3] -  NULL 
**  Grp 6 [a3]  -  NULL 
**  Grp 7 [A_4] -  ( pos 0 , len 0 )  EMPTY 
**  Grp 8 [a4]  -  NULL

Answer 3

简化：

问题

您从Python 2.7正则表达式中收到错误“sre_constants.error：unmatched group”。
您有任何带有可选组的正则表达式模式（包含或不包含嵌套表达式），并尝试在子 repl acement参数（re.sub(pattern, *repl*, string)或compiled.sub(*repl*, string)中使用这些组）

解决方案：

对于结果，请返回match.group(1)而不是\1（或2,3等）。而已;没有或需要。可以使用函数或lambda返回组结果。

实施例

您正在使用common regex to strip C-style comments。它的设计uses an optional group 1传递不应删除的伪注释（如果它们存在）。

pattern = r'//.*|/\*[\s\S]*?\*/|("(\\.|[^"])*")'
regex = re.compile(pattern)

使用\1失败并显示错误：“sre_constants.error：unmatched group”：

return regex.sub(r'\1', string)

使用.group(1)成功：

return regex.sub(lambda m: m.group(1), string)

对于那些不熟悉lambda的人来说，这个解决方案相当于：

def optgroup(match):
    return match.group(1)
return regex.sub(optgroup, string)

有关为什么 \1由于错误1519638而失败的详细讨论，请参阅接受的答案。虽然接受的答案具有权威性，但它有两个缺点：1）原始示例问题是如此令人费解以至于它使得示例解决方案难以阅读，2）它建议返回一个或组的空字符串 - 这不是必需的，您可能只需在每场比赛中调用.group()

空字符串而不是不匹配的组错误

3 个答案:

根本原因

解决方案1：添加空替代方案以使必需组成为必需

解决方案2：在替换中使用lambda表达式并检查该组是否为`None`

问题

解决方案：

实施例

空字符串而不是不匹配的组错误

3 个答案:

根本原因

解决方案1：添加空替代方案以使必需组成为必需

解决方案2：在替换中使用lambda表达式并检查该组是否为None

问题

解决方案：

实施例

解决方案2：在替换中使用lambda表达式并检查该组是否为`None`