htmlagilitypack在多线程中使用cpu超过50%

时间:2012-02-07 07:51:35

标签: c# performance html-agility-pack

我的应用程序使用10个线程来读取大量的html文件。类似下面的代码:

for (int i = 0; i < 10; i++)
{
    new Thread(ParserHtmlWork)
    {
       IsBackground = true
    }.Start();
}

void ParserHtmlWork()
{            
      while (true)
      {
          //read the next file from the queue.
          var filePath = Query.Enqueue();
          using (var stream = OpenFile(filePath))
          {
              stream.Close();
          }
          System.Threading.Thread.Sleep(800);
      }
}

上面的代码运行没问题,avg cpu是2%-5%,接下来我改变我的代码,将htmlagilitypack库添加到解析器html。

private HtmlDocument CreateHtmlDocument(Stream stream, Encoding encoding)
{
    var doc = new HtmlDocument();
    ////Defines if the 'id' attribute must be specifically used. 
    doc.OptionUseIdAttribute = false;
    //Defines if declared encoding must be read from the document. 
    //Declared encoding is determined using the meta http-equiv="content-type" content="text/html;charset=XXXXX" html node
    doc.OptionReadEncoding = false;
    doc.Load(stream, encoding);
    return doc;
}

更改ParserHtmlWork方法add调用CreateHtmlDocument方法:

 using (var stream = OpenFile(filePath))
 {
     CreateHtmlDocument(stream, Encoding.UTF8);
     stream.Close();
 }

再次运行以上,avg cpu高达50-60%(平均文件大小为80k)。如果我将线程数减少到1,则ave cpu降至2%-5%。

我通过我的产品中的visual studio性能捕获cpu采样(不是上面的代码):

ApplicationEngine.Start()
Inclusive Samples: 398
Exclusive Samples: 0
Inclusive Samples %: 76
Exclusive Samples %: 0

ApplicationEngine.DoWork(class System.IO.Stream)
Inclusive Samples: 337
Exclusive Samples: 0
Inclusive Samples %: 64.44
Exclusive Samples %: 0.00

CreateHtmlDocument(class System.IO.Stream,class System.Text.Encoding)
Inclusive Samples: 298  
Exclusive Samples: 0
Inclusive Samples %: 56.98
Exclusive Samples %: 0.00

HtmlAgilityPack.HtmlDocument.Load(class System.IO.Stream,class System.Text.Encoding)
Inclusive Samples: 296
Exclusive Samples: 0
Inclusive Samples %: 56.60
Exclusive Samples %: 0.00

HtmlAgilityPack.HtmlDocument.Load(class System.IO.TextReader)
Inclusive Samples: 294
Exclusive Samples: 0
Inclusive Samples %: 56.21
Exclusive Samples %: 0.00

HtmlAgilityPack.HtmlDocument.Parse()
Inclusive Samples: 273
Exclusive Samples: 13
Inclusive Samples %: 52.20
Exclusive Samples %: 2.49

HtmlAgilityPack.HtmlDocument.PushNodeEnd(int32,bool)
Inclusive Samples: 135
Exclusive Samples: 2
Inclusive Samples %: 25.81
Exclusive Samples %: 0.38

[clr.dll]   130 106 24.86   20.27

System.String.ToLower()             
Inclusive Samples: 118
Exclusive Samples: 118
Inclusive Samples %: 22.56
Exclusive Samples %: 22.56

HtmlAgilityPack.HtmlNode.get_Name()             
Inclusive Samples: 81
Exclusive Samples: 3
Inclusive Samples %: 15.49
Exclusive Samples %: 0.57

1 个答案:

答案 0 :(得分:2)

那你的问题是什么?

使用CPU的HTML解析器?你期望什么 - 下载没有,HTML解析使用CPU,如果你使用很多并行线程,那么是的,这将加起来。

你可以做的不是很多 - 通过一个提示器来优化HtmlAgilityPack,看看那里是否存在瓶颈。如果不是......那么......获得更快的处理器/更多服务器或优化您的代码。

投票关闭和-1 - 我没有看到任何相关的问题,除了“哦,我的上帝,我的CPU在必须工作时使用”。