Question

这是我正在运行的代码：

Dim descriptionMatches As MatchCollection = Regex.Matches(pageJSON, "\[\[(([\w]+[\s]*)+)\]\], (([\w]+[\s]*)+)\\n")
Console.WriteLine(descriptionMatches.Count)

现在，一切正常，直到最后一行。看起来MatchCollection.Count（）方法需要很长时间才能执行，所以很长时间，我运行程序的时间超过2分钟......

以下是其他一些信息。

当我将正则表达式模式切割为"\[\[(([\w]+[\s]*)+)\]\]"时，我得到了35个匹配，而且看起来很快。
当我使用for循环来解析MatchCollection时，如果我使用for i = 0形式的循环来匹配collection.count，则循环不会被执行（就像正则表达式仍在尝试分析输入字符串。如果我为每个使用a（不同之处在于最新使用迭代器），我会在冻结前得到第15个匹配。很奇怪不是吗？
这是我想要匹配的字符串的链接，正如您将看到的，它不是有史以来最长的字符串：Wikipedia API result for SRS
在我的模式出现问题并且你想建议我一个新模式的可能情况下，我想要匹配的内容如下：

[[项目名称]]，项目描述\ n

过去我经常使用正则表达式，这从未发生在我身上。如果有人知道问题是什么，请你告诉我它是什么以及如何解决它？

Answer 1

您希望匹配两个[[，然后匹配两个]]。让自己变得简单：

\[\[([^][]+)\]\], (.*?)\\n\*

在http://regex101.com/r/kK5rO4

工作时看到它

说明：

\[\[       find two literal [[ in a row
([^][]+)   match at least one character that is not ] or [ (note - the order matters)
           and "save" that match (so you can pull it out later)
\]\]       all the fun stops when you hit two closing brackets
           (but since the match already said "no closing brackets" there is no backtracking)
,          match comma followed by space
(.*?)      match the least amount you can until you get to…

\\n\*      literal \n* (both the \ and the * need a backslash to escape them

传统的正则表达式需要一个g标志来匹配“所有实例”，但我认为其余的代码会有效地处理它。

Answer 2

您的正则表达式会导致“catastrophic backtracking”，使其过于复杂。

考虑将您的正则表达式重写为possessive。

正则表达式需要不寻常的时间吗？

2 个答案: