获取div htmlagilitypack中所有<p>的文本</p>

时间:2012-06-15 05:42:07

标签: .net winforms parsing html-agility-pack

我有一个div,其中包含像这样的段落标记

<div class="div_5">
                <p>First Paragraph</p>
                <p>Second Paragraph</p>
                <p>Third Paragraph</p>
                <p>Fourth Paragraph</p>
 </div>
<div class="div_5">
                <p>First Paragraph</p>
                <p>Second Paragraph</p>
                <p>Third Paragraph</p>
                <p>Fourth Paragraph</p>
 </div>

我需要使用htmlagiitypack获取所有paragrap文本的文本我试过这个,

Dim oPB As HAP.HtmlNodeCollection = doc.DocumentNode.SelectNodes("//div[@class='post-bodycopy clearfix']/child::text()/"]
For Each item As HAP.HtmlNode In oPB
                    debug.print(item.InnerText)
                Next

每个div字符串的预期输出是

First Paragraph
Second Paragraph
Third Paragraph
Fourth Paragraph

但是我在返回的文本中得到了一些html,有人可以帮我纠正问题

1 个答案:

答案 0 :(得分:2)

您必须实际选择段落的内部文本。你的xpath完全得到了别的东西。

Dim query = doc.DocumentNode.SelectNodes("//div[@class='div_5']/p/text()")