我的应用程序使用10个线程来读取大量的html文件。类似下面的代码:
for (int i = 0; i < 10; i++)
{
new Thread(ParserHtmlWork)
{
IsBackground = true
}.Start();
}
void ParserHtmlWork()
{
while (true)
{
//read the next file from the queue.
var filePath = Query.Enqueue();
using (var stream = OpenFile(filePath))
{
stream.Close();
}
System.Threading.Thread.Sleep(800);
}
}
上面的代码运行没问题,avg cpu是2%-5%,接下来我改变我的代码,将htmlagilitypack库添加到解析器html。
private HtmlDocument CreateHtmlDocument(Stream stream, Encoding encoding)
{
var doc = new HtmlDocument();
////Defines if the 'id' attribute must be specifically used.
doc.OptionUseIdAttribute = false;
//Defines if declared encoding must be read from the document.
//Declared encoding is determined using the meta http-equiv="content-type" content="text/html;charset=XXXXX" html node
doc.OptionReadEncoding = false;
doc.Load(stream, encoding);
return doc;
}
更改ParserHtmlWork方法add调用CreateHtmlDocument方法:
using (var stream = OpenFile(filePath))
{
CreateHtmlDocument(stream, Encoding.UTF8);
stream.Close();
}
再次运行以上,avg cpu高达50-60%(平均文件大小为80k)。如果我将线程数减少到1,则ave cpu降至2%-5%。
我通过我的产品中的visual studio性能捕获cpu采样(不是上面的代码):
ApplicationEngine.Start()
Inclusive Samples: 398
Exclusive Samples: 0
Inclusive Samples %: 76
Exclusive Samples %: 0
ApplicationEngine.DoWork(class System.IO.Stream)
Inclusive Samples: 337
Exclusive Samples: 0
Inclusive Samples %: 64.44
Exclusive Samples %: 0.00
CreateHtmlDocument(class System.IO.Stream,class System.Text.Encoding)
Inclusive Samples: 298
Exclusive Samples: 0
Inclusive Samples %: 56.98
Exclusive Samples %: 0.00
HtmlAgilityPack.HtmlDocument.Load(class System.IO.Stream,class System.Text.Encoding)
Inclusive Samples: 296
Exclusive Samples: 0
Inclusive Samples %: 56.60
Exclusive Samples %: 0.00
HtmlAgilityPack.HtmlDocument.Load(class System.IO.TextReader)
Inclusive Samples: 294
Exclusive Samples: 0
Inclusive Samples %: 56.21
Exclusive Samples %: 0.00
HtmlAgilityPack.HtmlDocument.Parse()
Inclusive Samples: 273
Exclusive Samples: 13
Inclusive Samples %: 52.20
Exclusive Samples %: 2.49
HtmlAgilityPack.HtmlDocument.PushNodeEnd(int32,bool)
Inclusive Samples: 135
Exclusive Samples: 2
Inclusive Samples %: 25.81
Exclusive Samples %: 0.38
[clr.dll] 130 106 24.86 20.27
System.String.ToLower()
Inclusive Samples: 118
Exclusive Samples: 118
Inclusive Samples %: 22.56
Exclusive Samples %: 22.56
HtmlAgilityPack.HtmlNode.get_Name()
Inclusive Samples: 81
Exclusive Samples: 3
Inclusive Samples %: 15.49
Exclusive Samples %: 0.57
答案 0 :(得分:2)
那你的问题是什么?
使用CPU的HTML解析器?你期望什么 - 下载没有,HTML解析使用CPU,如果你使用很多并行线程,那么是的,这将加起来。
你可以做的不是很多 - 通过一个提示器来优化HtmlAgilityPack,看看那里是否存在瓶颈。如果不是......那么......获得更快的处理器/更多服务器或优化您的代码。
投票关闭和-1 - 我没有看到任何相关的问题,除了“哦,我的上帝,我的CPU在必须工作时使用”。