Question

我知道已经有几个关于这个问题的问题，但没有人帮我解决我的问题......

当我们按照标签{SPEAKER}或{GROUP OF SPEAKERS}时，我必须替换csv文档中的名称。但是，我收到以下错误消息：

File "/usr/lib/python2.7/re.py", line 291, in filter
    return sre_parse.expand_template(template, match)
  File "/usr/lib/python2.7/sre_parse.py", line 831, in expand_template
    raise error, "unmatched group"
sre_constants.error: unmatched group

我的剧本部分是：

list_speakers = re.compile(r'^\{GROUP OF SPEAKERS\}\t(.*)|^\{SPEAKER\}\t(.*)')

usernames = set()
for f in corpus:
    with open(f, "r", encoding=encoding) as fin:
        line = fin.readline()
        while line:
            line = line.rstrip()
            if not line:
                line = fin.readline()
                continue

            if not list_speakers.match(line):
                line = fin.readline()
                continue

            names = list_speakers.sub(r'\1', line)
            names = names.split(", ")
            for name in names:
                usernames.add(name)

            line = fin.readline()

Answer 1

issue is a known one：如果组未初始化，则反向引用不会设置为最高3.5的Python版本中的空字符串。

您需要确保只有一个或使用lambda表达式作为替换参数来实现自定义替换逻辑。

在这里，您可以轻松地将正则表达式修改为具有单个捕获组的模式：

r'^\{(?:GROUP OF SPEAKERS|SPEAKER)\}\t(.*)'

请参阅regex demo

<强>详情

^ - 字符串开头
\{ - {
(?:GROUP OF SPEAKERS|SPEAKER) - 与GROUP OF SPEAKERS或SPEAKER匹配的非捕获组
\} - }（你也可以写}，它不需要转义）
\t - 标签字符
(.*) - 第1组：除了换行符之外的任何0 +字符，尽可能多（字符串的其余部分）。

Python：sre_constants.error的错误：不匹配的组

1 个答案: