Question

我有正则表达式列表，如下所示：

regexes = [   
    re.compile(r"((intrigued)(.*)(air))"),
    re.compile(r"(air|ipadair)(.*)(wishlist|wish)"),
    re.compile(r"(replac(ed|ing|es|.*)(.*)(with)(.*)(air))"),
    re.compile(r"(upgrade)")]
for regex in regexes:
      if regex.search(post):
           print 1
           break

假设我有一长串字符串，我想在每个字符串中搜索这些正则表达式，如果任何正则表达式匹配返回1并且中断。然后为下一个字符串做同样的事情。我现在的速度超慢，如果有更好的选择，请告诉我。

谢谢，

Answer 1

正如一些评论所提到的，这似乎可能不是正则表达式的工作。我认为值得看看你在这里尝试做什么。看看其中一个正则表达式：

"(air|ipadair)(.*)(wishlist|wish)"

在这种情况下，我们正在匹配＆＃34; air＆＃34;或者＆＃34; ipadair＆＃34;，但只是＆＃34; air＆＃34;会匹配两者。＃34;愿望＆＃34;也是如此。由于我们没有使用捕获组，因此输出可以简化为：

"air.*wish"

所有其他模式都是如此，这引出了一个问题：这个正则表达式实际上在做什么？

看起来您只想查看文章中某些单词模式是否出现在文章中。如果这是真的，那么我们可以在没有正则表达式的python中更快地实现这一点：

def has_phrases(in_string, phrases):
    for words in phrases:
        start = 0
        match = True

        # Match all words
        for word in words:
            # Each word must come after the ones before
            start = in_string.find(word, start)
            if start == -1:
                match = False
                break

        if match:
            return True

phrases = [
    ['upgrade'],
    ['air', 'wish'],
    ['intrigued', 'air'],
    ['replac', 'with', 'air' ],
]

print has_phrases("... air ... wish ...", phrases)      # True!
print has_phrases("... horse ... magic ...", phrases)   # None

当然，如果你只是给出一个简单的例子，并且你打算使用疯狂的复杂正则表达式，那么这不会削减它。

希望有所帮助！

匹配python中正则表达式的列表

1 个答案: