通过HTML Agility Pack获取另一个表中嵌入表的行和列

时间:2017-04-03 21:30:19

标签: html vb.net html-table nodes html-agility-pack

VB.2012使用HTML Agility Pack。 我花了几个小时试图解决这个问题,这是我对输入格式的无知。这就是我的输入:一个简单的HTML表,其中嵌入了两个其他表

<table cellpadding="0" cellspacing="0" border="0">
    <tr>
        <td width="100%">
            <table cellpadding="0" cellspacing="0" border="0" class="plan">
                <tr>
                    <td class="textBold" valign="bottom">XX&nbsp;<u>999</u></td>
                    <td class="centerText" valign="bottom">X1</td>
                    <td class="centerText" valign="bottom">X2</td>
                    <td class="centerText" valign="bottom">X3</td>
                    <td class="centerText" valign="bottom">X4</td>
                    <td class="centerText" valign="bottom">X5</td>
                    <td class="centerTextTotal" valign="bottom">TOTAL</td>
                </tr>
                <tr>
                    <td class="Text">PRIMARY</td>
                    <td class="centerText">4</td>
                    <td class="centerText">8</td>
                    <td class="centerText">&nbsp;</td>
                    <td class="centerText">1</td>
                    <td class="centerText">3</td>
                    <td class="centerTextTotal">16</td>
                </tr>
                <tr>
                    <td class="TextColor">SECONDARY</td>
                    <td class="centerTextColor">&nbsp;</td>
                    <td class="centerTextColor">&nbsp;</td>
                    <td class="centerTextColor">2</td>
                    <td class="centerTextColor">&nbsp;</td>
                    <td class="centerTextColor">2</td>
                    <td class="centerTextTotal">4</td>
                </tr>
                <tr>
                    <td class="TextTotal">TOTAL</td>
                    <td class="centerTextTotal">4</td>
                    <td class="centerTextTotal">8</td>
                    <td class="centerTextTotal">2</td>
                    <td class="centerTextTotal">1</td>
                    <td class="centerTextTotal">5</td>
                    <td class="centerTextTotal">20</td>
                </tr>
            </table>
        </td>
    </tr>
    <tr>
        <td width="100%">
            <table cellpadding="0" cellspacing="0" border="0" width="100%">
                <tr>
                    <td width="75%" class="" textcolorvalign="bottom">Number of fuelings:0</td>
                    <td width="25%" class="" textcolorvalign="bottom" align="right">Meals:2</td>
                </tr>
            </table>
        </td>
    </tr>
</table>

我只关心内部表格中的数据&#34;计划&#34;。

        Dim html As HtmlAgilityPack.HtmlDocument = New HtmlAgilityPack.HtmlDocument
        html.OptionOutputAsXml = False
        html.LoadHtml(htmlTable)

        Dim docNode As HtmlAgilityPack.HtmlNode = html.DocumentNode

        'parse the plan table if it exists
        If docNode IsNot Nothing Then
            Dim hTable As HtmlAgilityPack.HtmlNode = docNode.SelectSingleNode("//table[@class='plan']")
            If hTable IsNot Nothing Then
                For Each hRow As HtmlAgilityPack.HtmlNode In hTable.SelectNodes("//table[@class='plan']//tr") '"//tr"
                    Debug.Print("   InnerText=>[{0}] InnerHtml=>[{1}]", hRow.InnerText, hRow.InnerHtml)

                    For Each hCol As HtmlAgilityPack.HtmlNode In hRow.SelectNodes("//table[@class='plan']//tr//td") '"//td"
                        Debug.Print("      InnerText=>[{0}] InnerHtml=>[{1}]", hCol.InnerText, hCol.InnerHtml)
                    Next hCol
                Next hRow
            End If
        End If

在右边,我有最初使用的字符串// tr和// td。我的逻辑是,因为我使用节点hTable和hRow,我会得到相应的子节点。但是,似乎这将从ALL表中获取所有行和所有列。经过测试,似乎我必须使用//表[@class =&#39;计划&#39;] // tr和//表[@class =&#39;计划&#39;] //完全限定每个循环。 TR // TD。这是为什么???它对我来说没有意义,因为我明确使用了子节点对象hTable和hRow。

1 个答案:

答案 0 :(得分:0)

根据this,在XPath中//表示从根搜索,如果要从当前上下文搜索,则需要.//。因此,请尝试.//tr.//td进行相对于当前元素的搜索。