Question

我需要用Python处理数千篇文章，我使用正则表达式替换50多个短语，而每篇文章的短语列表都不同。

我的代码有一个for循环，遍历列表并使用re.finditer()

查找这些短语

for item in phrases:
    for match in re.finditer(re.escape(item), article):
        process ..

示例：

phrases = ['apples', 'oranges', 'apple', 'orange', 'other types']

匹配

我正在考虑通过删除循环并使用一种模式来提高性能：

apples|oranges|apple|orange|other types

但是由于我有一个不断变化的长列表，我不确定正则表达式引擎是否会提供更好的性能。欢迎任何有关此事的说明。

Answer 1

如果您已有列表，则可以尝试if语句。它是否改变是无关紧要的。

尝试不同的方式并对每一个进行基准确认..

phrases = ['apples', 'oranges', 'apple', 'orange', 'other types']

match = 'apples'

if match in phrases:       
    print(match)