试图从HTML页面中提取标签

时间:2013-09-08 10:31:42

标签: c# xpath html-agility-pack

我需要找到一个HTML页面的所有节点,这些节点具有结构<h2><span class="mw-headline" ...> ... </span></h2><h2>...<\h2>对确定开头和&amp;节点的结尾。我试图找到这样的节点:

string raw_code = doc.DocumentNode.SelectNodes("/")[0].WriteTo(); // can there be more than 1 node there?
string[] lines = raw_code.Split('\n'); 
foreach(HtmlNode hdr in doc.DocumentNode.SelectNodes("//span[@class = \"mw-headline\"]"))
{
  int line_number = hdr.Line;
  int line_position = hdr.LinePosition;
  string font_tag = lines[line_number].Substring(line_position - font_tag_length, line_position);
  MessageBox.Show(lines[line_number]); // returns div c
}

坦率地说,MessageBox.Show()会显示任何内容,但不会显示,包括<div class="thumb tright"><p>Mostly flat plains or gently rolling hills in north and west.</p>
我做错了什么?

0 个答案:

没有答案