Question

我在解析html时使用matchcollection。但这个解决方案需要很长时间，有时会失败。我在想如果我设置了matchcollection超时，这个麻烦就会解决。如何设置matchcollection的超时？（框架4.0）

anchorPattern[0]="<div.*?class=\"news\">.*?<div.*?class=\".*?date.*?\">(?<date>.*?)?</div>.*?<a.*?href=\"(?<link>.*?)\".*?>(?<title>.*?)?</a>.*?<(span.*?class=\".*?desc.*?\">(?<spot>.*?)?</span>)?"
    MatchCollection mIcerik = Regex.Matches(html, anchorPattern[i], RegexOptions.Compiled);
    if (mIcerik.Count > 0)
          ListDegree.Add(i,mIcerik.Count);

Answer 1

您的正则表达式有太多".*?"，并且对于某些输入，可能的组合数可能接近“无限”。尝试使用原子组"(?>.*?)"来自动丢弃由组内任何令牌记住的所有回溯位置。这至少会使所有正则表达式解析都花费有限的时间。

Answer 2

TimeSpan timeout = new TimeSpan(0, 1, 0);

anchorPattern[0]="<div.*?class=\"news\">.*?<div.*?class=\".*?date.*?\">(?<date>.*?)?</div>.*?<a.*?href=\"(?<link>.*?)\".*?>(?<title>.*?)?</a>.*?<(span.*?class=\".*?desc.*?\">(?<spot>.*?)?</span>)?"

MatchCollection mIcerik = Regex.Matches(html, anchorPattern[i], RegexOptions.Compiled,timeout);


if (mIcerik.Count > 0)
      ListDegree.Add(i,mIcerik.Count);

Timespan参数建立匹配所有对象的超时间隔。或者您可以使用Regex.InfiniteMatchTimeout表示该方法不应超时。 MSDN regex.Matches()

matchcollection超时

2 个答案: