忽略一些TR节点

时间:2018-10-14 19:02:17

标签: html-agility-pack

我有一个类似

的HTML
<body>
<tr class="sysinfoTableCategoryHeader">
    <td colspan="4">Operating System</td>
</tr>

    <tr class="sysinfoTablePropertyEven">
        <td />
        <td />
        <td><span class="sysinfoTablePropertyKey">Operating System Name</span></td>
        <td><span class="sysinfoTablePropertyValue">Linux</span></td>
    </tr>

    <tr class="sysinfoTablePropertyOdd">
        <td />
        <td />
        <td><span class="sysinfoTablePropertyKey">Kernel Version</span></td>
        <td><span class="sysinfoTablePropertyValue">4.8.0-1-amd64</span></td>
    </tr>

<tr class="sysinfoTableCategoryHeader">
    <td colspan="4">Motherboard</td>
</tr>

    <tr class="sysinfoTablePropertyEven">
        <td />
        <td />
        <td><span class="sysinfoTablePropertyKey">Manufacturer</span></td>
        <td><span class="sysinfoTablePropertyValue">Acer</span></td>
    </tr>

    <tr class="sysinfoTablePropertyOdd">
        <td />
        <td />
        <td><span class="sysinfoTablePropertyKey">Product</span></td>
        <td><span class="sysinfoTablePropertyValue">Aspire E5-531</span></td>
    </tr>
</body>

所以我能够从这个html文件中挑选出最棒的东西。但是有一个问题。可以说,我想忽略class name="sysinfoTableCategoryHeader"操作系统的节点。

这完全可行吗?

我的输出应该是这样

<body>
<tr class="sysinfoTableCategoryHeader">
    <td colspan="4">Motherboard</td>
</tr>

    <tr class="sysinfoTablePropertyEven">
        <td />
        <td />
        <td><span class="sysinfoTablePropertyKey">Manufacturer</span></td>
        <td><span class="sysinfoTablePropertyValue">Acer</span></td>
    </tr>

    <tr class="sysinfoTablePropertyOdd">
        <td />
        <td />
        <td><span class="sysinfoTablePropertyKey">Product</span></td>
        <td><span class="sysinfoTablePropertyValue">Aspire E5-531</span></td>
    </tr>
</body>

我如何与HTMLAGILITYPACK配合使用?

2 个答案:

答案 0 :(得分:1)

我有点英语。 验证码:

    HtmlDocument htmlDoc = new HtmlDocument(); 
htmlDoc.LoadHtml(your html code); 
HtmlNodeCollection htmlNodes = htmlDoc.DocumentNode.SelectNodes("//body/tr[@class!='sysinfoTableCategoryHeader']");

您需要htmlNodes。 或使用RemoveAllIDforNode();

    HtmlNodeCollection htmlNodes = htmlDoc.DocumentNode.SelectNodes("//body/tr[@class='sysinfoTableCategoryHeader']"); 

foreach (HtmlNode node in htmlNodes) {
 htmlDoc.DocumentNode.RemoveAllIDforNode(node); 
}

答案 1 :(得分:0)

您需要找到xpath // tr [@class!='sysinfoTableCategoryHeader'] xpath具有操作符。