我有一个像这样的HTML代码:
<div class="NewsItemContainer">
<div class="a1">
<img class="NewsThumbnail" src="1.jpg">
</div>
<div class="NewsLead">
<span>title</span>
</div>
</div>
我希望获得节点中的所有子节点。
这是我的代码:
HtmlDocument doc = new HtmlDocument();
doc.Load(@"c:\a.htm");
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//div[@class='NewsItemContainer' and @id]");
foreach (HtmlAgilityPack.HtmlNode node in nodes)
{
//HtmlNode h2Node = node.SelectSingleNode("NewsThumbnail");
foreach (HtmlNode div in node.SelectNodes("//img[@class='NewsThumbnail' and @id]"))
{
HtmlAttribute att = div.Attributes["src"];
img = att.Value;
}
foreach (HtmlNode div in node.SelectNodes("//span[@class='NewsLead' and @id]"))
{
//HtmlAttribute att = div.InnerText;
dsc = div.InnerText;
}
MessageBox.Show(img + "\n\r" + dsc);
}
我做错了什么?
答案 0 :(得分:0)
始终确保在SelectNodes
中包含锚点,//
将始终从根节点查看,.//
从当前所选节点查看。
另外,你所查找的范围没有应用它的类,它是父div。
HtmlDocument doc = new HtmlDocument();
doc.Load(@"c:\a.htm");
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//div[@class='NewsItemContainer' and @id]");
foreach (HtmlAgilityPack.HtmlNode node in nodes)
{
foreach (HtmlNode div in node.SelectNodes(".//img[@class='NewsThumbnail']"))
{
HtmlAttribute att = div.Attributes["src"];
img = att.Value;
}
foreach (HtmlNode div in node.SelectNodes("./div[@class='NewsLead']/span"))
{
//HtmlAttribute att = div.InnerText;
dsc = div.InnerText;
}
MessageBox.Show(img + "\n\r" + dsc);
}
此外,如果您专门查找属性值,则可以使用@attributename
结束XPath,如下所示:
foreach (HtmlNode div in node.SelectNodes(".//img[@class='NewsThumbnail']/@src"))
这样,您可以直接使用div.InnerText
获取属性值,而无需先查找dev.Attributes["src"];