Question

我有一个简单的正则表达式（在c＃中使用）：

\becua(?:[a-zA-ZáéíóúñÑÑäëïöü])*\b(.(?!embajada))*\s+embajada

1）以“ecua”开头的单词
2）无论那之后发生什么 3）“无论”之后的“embajada”一词

但它做了太多步骤，我该如何防止这种情况？我只是希望它能够传递这些角色，直到找到“embajada”这个词，而不是在每一个角色上回溯。这似乎是简单的正则表达式，但是当我使用更大的文本时，它会在模式失败时抛出灾难性的回溯（或超时）

示例：https://regex101.com/r/tQ7mM9/4

提前致谢

Answer 1

你可以用贪婪的方式编写你的模式，但这次用量子在一个原子组中包含所有部分。要做到这一点，你显然需要进行前瞻性的测试，但为了限制太多测试的影响，你可以使用（[^e]这里）的字符类来帮助正则表达式引擎只执行在有趣的位置进行测试：

\becua(?>\w*[^e]*(?:\Be[^e]*|e(?!mbajada\b)[^e]*)*)embajada

细节：

\becua
(?>
    \w*      # last part of "ecua..."

    [^e]*    # all that is not an "e"
    (?:
        \Be            # an "e" not at the start of a word
        [^e]*
      |
        e(?!mbajada\b) # an "e" that is not the start of "embajada"
        [^e]*
    )*       # repeat as possible
)   # close the atomic group (backtracking is no more possible)
embajada

When the pattern fails
When the pattern succeeds

现在是一种非贪婪的方法（同样的想法来限制非贪婪量词的影响）：

\becua(?>e*[^e]+)*?\bembajada\b

When the pattern fails
When the pattern succeeds

Answer 2

这是你在找什么？ \b(ecua\w+) .*? (embajada)

如何不让正则表达式做出太多步骤？

2 个答案: