VB.2012使用HTML Agility Pack。 我花了几个小时试图解决这个问题,这是我对输入格式的无知。这就是我的输入:一个简单的HTML表,其中嵌入了两个其他表
<table cellpadding="0" cellspacing="0" border="0">
<tr>
<td width="100%">
<table cellpadding="0" cellspacing="0" border="0" class="plan">
<tr>
<td class="textBold" valign="bottom">XX <u>999</u></td>
<td class="centerText" valign="bottom">X1</td>
<td class="centerText" valign="bottom">X2</td>
<td class="centerText" valign="bottom">X3</td>
<td class="centerText" valign="bottom">X4</td>
<td class="centerText" valign="bottom">X5</td>
<td class="centerTextTotal" valign="bottom">TOTAL</td>
</tr>
<tr>
<td class="Text">PRIMARY</td>
<td class="centerText">4</td>
<td class="centerText">8</td>
<td class="centerText"> </td>
<td class="centerText">1</td>
<td class="centerText">3</td>
<td class="centerTextTotal">16</td>
</tr>
<tr>
<td class="TextColor">SECONDARY</td>
<td class="centerTextColor"> </td>
<td class="centerTextColor"> </td>
<td class="centerTextColor">2</td>
<td class="centerTextColor"> </td>
<td class="centerTextColor">2</td>
<td class="centerTextTotal">4</td>
</tr>
<tr>
<td class="TextTotal">TOTAL</td>
<td class="centerTextTotal">4</td>
<td class="centerTextTotal">8</td>
<td class="centerTextTotal">2</td>
<td class="centerTextTotal">1</td>
<td class="centerTextTotal">5</td>
<td class="centerTextTotal">20</td>
</tr>
</table>
</td>
</tr>
<tr>
<td width="100%">
<table cellpadding="0" cellspacing="0" border="0" width="100%">
<tr>
<td width="75%" class="" textcolorvalign="bottom">Number of fuelings:0</td>
<td width="25%" class="" textcolorvalign="bottom" align="right">Meals:2</td>
</tr>
</table>
</td>
</tr>
</table>
我只关心内部表格中的数据&#34;计划&#34;。
Dim html As HtmlAgilityPack.HtmlDocument = New HtmlAgilityPack.HtmlDocument
html.OptionOutputAsXml = False
html.LoadHtml(htmlTable)
Dim docNode As HtmlAgilityPack.HtmlNode = html.DocumentNode
'parse the plan table if it exists
If docNode IsNot Nothing Then
Dim hTable As HtmlAgilityPack.HtmlNode = docNode.SelectSingleNode("//table[@class='plan']")
If hTable IsNot Nothing Then
For Each hRow As HtmlAgilityPack.HtmlNode In hTable.SelectNodes("//table[@class='plan']//tr") '"//tr"
Debug.Print(" InnerText=>[{0}] InnerHtml=>[{1}]", hRow.InnerText, hRow.InnerHtml)
For Each hCol As HtmlAgilityPack.HtmlNode In hRow.SelectNodes("//table[@class='plan']//tr//td") '"//td"
Debug.Print(" InnerText=>[{0}] InnerHtml=>[{1}]", hCol.InnerText, hCol.InnerHtml)
Next hCol
Next hRow
End If
End If
在右边,我有最初使用的字符串// tr和// td。我的逻辑是,因为我使用节点hTable和hRow,我会得到相应的子节点。但是,似乎这将从ALL表中获取所有行和所有列。经过测试,似乎我必须使用//表[@class =&#39;计划&#39;] // tr和//表[@class =&#39;计划&#39;] //完全限定每个循环。 TR // TD。这是为什么???它对我来说没有意义,因为我明确使用了子节点对象hTable和hRow。