Question

我正在尝试编写一个Python程序来检查文件中的短语是否出现在文档中。我的程序运行正常，直到遇到诸如“快乐（+）英尺”之类的短语。我认为错误与短语中的“（+）”有关;但是，我不知道如何修改我的正则表达式以使其正常工作。

这是我的代码：

import re
handle = open('document.txt', 'r')
text = handle.read()

lst = list()
with open('phrases.txt', 'r') as phrases:
    for phrase in phrases:
        phrase = phrase.rstrip()
        if len(phrase) > 0 and phrase not in lst:
            ealst.append(phrase)

counts = {}
for each_phrase in lst:
    word = each_phrase.rsplit()
    pattern = re.compile(r'%s' % '\s+'.join(word), re.IGNORECASE)
    counts[each_phrase] = len(pattern.findall(text))

for key, value in counts.items():
    if value > 0:
       print key,',', value

 handle.close()
 phrases.close()

Answer 1

在声明word：

时，您需要使用re.escape

word = map(re.escape, each_phrase.rsplit())

也许，将\s+更改为\s*以使空格可选：

pattern = re.compile(r'%s' % '\s*'.join(word), re.IGNORECASE)

圆括号(和)以及+加上符号special regex characters必须在字符类之外的正则表达式中进行转义，以匹配文字字符。< / p>

示例IDEONE demo

无法将多个字符串列表连接回Python中的单个字符串列表

1 个答案: