Question

我有4个正则表达式，我得到了旋转，它们工作得很好，但在一些应用程序中，它们正在寻找cpu。我知道他们是最好看的正则表达式，但我不确定更好的方法。我可以做些什么来优化这些正则表达式吗？

public static Regex schemaOrg = new Regex(@"\s*itemtype\s*=\s*('|"")\s*http://schema.org/\s*", RegexOptions.Singleline | RegexOptions.IgnoreCase | RegexOptions.Compiled);
public static Regex dataVocabulary = new Regex(@"\s*itemtype\s*=\s*('|"")\s*http://data-vocabulary.org/\s*", RegexOptions.Singleline | RegexOptions.IgnoreCase | RegexOptions.Compiled);

基本上寻找：

itemtype="http://schema.org/"

itemtype="http://data-vocabulary.org/"

但是由于任何数量的空格在html中仍然有效。

例如：

itemtype   ="http://schema.org/"

itemtype=   "http://schema.org/"

itemtype="   http://schema.org/   "

一切都有效。

更新：仍然把cpu坏了。

\s+itemtype\s*=\s*(?:'|"")\s*http://schema\.org/

Answer 1

到目前为止，我只能想到一些事情。

不需要尾随\s*（schema.org/\s*和vocabulary.org/\s*），将其删除。我之所以这么做是因为你没有检查尾随的引用。
. (dot)在正则表达式中具有特殊含义，在\.和schema.org中data-vocabulary.org进行转义。
第一个\s*没有意义，因为它也会使您的模式与someitemtype匹配。将其替换为\s+或尝试使用字边界\b作为模式的开头。
如果您对此感到偏执，也可以通过将('|"")替换为(?:'|"")来阻止正则表达式捕获论坛\s+?itemtype\s*?=\s*?(?:'|"")\s*?http://schema\.org/。

编辑：您还可以尝试延迟匹配，看看是否有帮助。我可以想象一下你的正则表达式会占用CPU的情况。试试下面的示例正则表达式：

{{1}}

如果这没有帮助，请在此问题的上下文中发布代码和示例字符串。

Answer 2

可能的改进是：

替换所有＆＃34; ＆＃34;到＆＃34;＆＃34;在做regex.match之前

然后你的正则表达式不需要所有这些

正则表达式正在盯住cpu

2 个答案: