HTMLAgilityPack获取<p>和<strong>文本</strong> </p>

时间:2012-11-07 04:55:05

标签: html vb.net html-parsing html-agility-pack

嘿所有我正在寻找获取此HTML代码的方法:

<DIV class=schedule_block>
<DIV class=channel_row><SPAN class=channel>
<DIV class=logo><IMG src='/images/channel_logos/WGNAMER.png'></DIV>
<P><STRONG>2</STRONG><BR>WGNAMER </P></SPAN>

使用HtmlAgilityPack。

我一直在尝试这个:

For Each channel In doc.DocumentNode.SelectNodes(".//div[@class='channel_row']")
   Dim info = New Dictionary(Of String, Object)()

   With channel
      info!Logo = .SelectSingleNode(".//img").Attributes("src").Value
      info!Channel = .SelectSingleNode(".//span[@class='channel']").ChildNodes(1).ChildNodes(0).InnerText
      info!Station = .SelectSingleNode(".//span[@class='channel']").ChildNodes(1).ChildNodes(2).InnerText
   End With
.......

我可以获得徽标,但它为频道和电台提供了一个空白字符串

  

指数超出范围。必须是非负数且小于   集合。

我尝试过所有类型的组合:

info!Station = .SelectSingleNode(".//span[@class='channel']").ChildNodes(1).ChildNodes(1).InnerText
info!Station = .SelectSingleNode(".//span[@class='channel']").ChildNodes(1).ChildNodes(3).InnerText
info!Station = .SelectSingleNode(".//span[@class='channel']").ChildNodes(0).ChildNodes(1).InnerText
info!Station = .SelectSingleNode(".//span[@class='channel']").ChildNodes(0).ChildNodes(2).InnerText
info!Station = .SelectSingleNode(".//span[@class='channel']").ChildNodes(0).ChildNodes(3).InnerText

为了纠正这个问题,我需要做些什么?

enter image description here

1 个答案:

答案 0 :(得分:1)

如果实际存在空格,则将其视为子节点。所以:

Dim channelSpan = .SelectSingleNode(".//span[@class='channel']")

info!Channel = channelSpan.ChildNodes(3).ChildNodes(0).InnerText
info!Station = channelSpan.ChildNodes(3).ChildNodes(2).InnerText