Question

我有一个html文档，我需要抓取table中第5个表格中的所有DOM元素，不要与第5个孩子table混淆。我的问题是这5 table深层结构可以包含在任意数量的div元素中，因此我不能使用绝对路径，例如

/ HTML /体/表/ tbody的/ TR / TD /表/ tbody的/ TR / TD /表/ tbody的/ TR / TD /表/ tbody的/ TR / TD /表

例如：

<body>    
    <table>    
        <table>
            <table>
                <table>
                   <!--Grab this one -->
                   <table>
                   </table>
                </table>
            </table>
       </table>
    </table>
</body>

或者这个：

 <body> 
    <div> <!--Could be wrapped more than just once though -->  
        <table>    
            <table>
                <table>
                    <table>
                       <!--Grab this one -->
                       <table>
                       </table>
                    </table>
                </table>
           </table>
        </table>
    </div>
</body>

Answer 1

我相信你想要在每个元素之间使用//表达式，形成完整的表达式：

//table//table//table//table//table

这将选择在其路径中任何位置具有4个表的任何表

Answer 2

使用：

(//table[count(ancestor::table) = 4])[1]

这将选择文档中具有四个名为table的祖先的table。

Answer 3

XElement doc = XElement.Parse(yourXml); 
var requiredTable = doc.Descendants("table").ElementAt(4);

Answer 4

对于mshtml（因为你的问题是c＃和html标记）访问html childnode元素的方法就像这里提到的： How can I retrieve all the text nodes of a HTMLDocument in the fastest way in C#?

也许这有帮助！

在HTML文档中选择第N个子节点

4 个答案: