XPath从HTML页面使用AgilityPack获取值

时间:2014-03-24 17:20:07

标签: html xpath html-agility-pack

我需要将网页中的数值变为两个变量。

页面上的摘录

<b>Downloads (current version):</b> 123                  <br />
<b>Downloads (total):</b> 253</td>
<br />

“下载(当前版本):”和“下载(总计):”是页面中的唯一字符串。

我需要将“123”和“253”变为变量

修改:感谢har07,我最终得到了

var downloadscurrentversion = htmlDoc.DocumentNode.SelectSingleNode(@"//b[.='Downloads (current version):']/following-sibling::text()[1]");
var downloadsallversions = htmlDoc.DocumentNode.SelectSingleNode(@"//b[.='Downloads (total):']/following-sibling::text()[1]");

Console.WriteLine("Total: " + downloadsallversions.InnerText.Trim());
Console.WriteLine("Current: " + downloadscurrentversion.InnerText.Trim());

1 个答案:

答案 0 :(得分:1)

检查此示例:

var html = @"<div>
<b>Downloads (current version):</b> 123                  <br />
<b>Downloads (total):</b> 253</td>
<br />
</div>";
var htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
var result = htmlDoc.DocumentNode.SelectNodes("/div/text()[normalize-space(.)]");
foreach (var r in result)
{
    Console.WriteLine(r.InnerText.Trim());
}

以上示例中的XPath的这一部分:

/div/text()

表示选择那些<div>元素直接子节点的文本节点。最后一部分:

[normalize-space(.)]

过滤掉空文本节点。

更新:

回复您的评论,您可以尝试这种方式:

var result = 
        htmlDoc.DocumentNode
               .SelectNodes(@"/div/b[.='Downloads (current version):' 
                                        or 
                                     .='Downloads (total):']/following-sibling::text()[1]");

上面的XPath选择直接在特定<b>元素之后的文本节点。