获得DIV CLASS的正确XPATH表达式

时间:2012-02-13 19:17:19

标签: c# xpath

我正试图遍历一个HTML文档:

<body class="style_0">
        <div>
            <div class="style_1">Pending Test List</div>
            <table style=" width: 100%;" id="AUTOGENBOOKMARK_4365445353431356880">
                <col>
                <col>
                <tbody>
                    <tr>
                        <td style="vertical-align: baseline;">
                            <div class="style_4">Pending Test List</div>
                        </td>
                        <td style="vertical-align: baseline;">
                            <div class="style_5">SOME AGENCY Laboratories, Inc.</div>
                        </td>
                    </tr>
                </tbody>
            </table>
            <table class="style_6" style=" width: 4.531in;" id="AUTOGENBOOKMARK_5083738604442918131">
                <col style=" width: 1in;">
                <col class="style_7" style=" width: 0.75in;">
                <col class="style_8" style=" width: 0.6in;">
                <col style=" width: 0.75in;">
                <col style=" width: 2.375in;">
                <tbody>
                    <tr class="style_9" style=" height: 0.5in;">
                        <td style="vertical-align: middle;">
                            <div class="style_10">Report Range:</div>
                        </td>
                        <td style="vertical-align: middle;">
                            <div class="style_11">01/01/2012</div>
                        </td>
                        <td style="vertical-align: middle;">
                            <div class="style_12">through</div>
                        </td>
                        <td style="vertical-align: middle;">
                            <div class="style_13">01/31/2012</div>
                        </td>
                        <td style="vertical-align: middle;">
                            <div class="style_14">(by Date Entered)</div>
                        </td>
                    </tr>
                </tbody>
            </table>
            <table class="style_15" style=" width: 100%;" id="AUTOGENBOOKMARK_7602283385844673591" iid="/526

(QuRs78576248:0)">
                <col style=" width: 0.75in;">
                <col style=" width: 1.25in;">
                <col style=" width: 1in;">
                <col style=" width: 1.5in;">
                <col style=" width: 1.5in;">
                <col style=" width: 1.5in;">
                <col>
                <thead>
                    <tr>
                        <td colspan="4" style="vertical-align: baseline;"></td>
                        <td style="vertical-align: baseline;"></td>
                        <td style="vertical-align: baseline;"></td>
                        <td style="vertical-align: baseline;"></td>
                    </tr>
                    <tr>
                        <td style="vertical-align: baseline;">
                            <div class="style_16">Entered</div>
                        </td>
                        <td style="vertical-align: baseline;">
                            <div class="style_16">Spec. ID</div>
                        </td>
                        <td style="vertical-align: baseline;">
                            <div class="style_16">Batch/Pos.</div>
                        </td>
                        <td style="vertical-align: baseline;">
                            <div class="style_16">Test</div>
                        </td>
                        <td style="vertical-align: baseline;">
                            <div class="style_16">Client ID</div>
                        </td>
                        <td style="vertical-align: baseline;">
                            <div class="style_16">Client Name</div>
                        </td>
                        <td style="vertical-align: baseline;">
                            <div class="style_16">Agency</div>
                        </td>
                    </tr>
                </thead>
                <tbody>
                    <tr>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_18">1/30/12</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_19">ZZ324sdf</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_18">51446 / 75</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">HOLD_DE</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">234234</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">smith, john</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">PPPM-6P - SOME AGENCY</div>
                        </td>
                    </tr>
                    <tr>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_18">1/31/12</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_19">SFD3434</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_18">51668 / 17</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">HOLD_DE</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">FOY, EL</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">FOY, ALEX</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">someagency &amp; Associates LLC</div>
                        </td>
                    </tr>
                    <tr>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_18">1/31/12</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_19">SFD3434</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_18">51668 / 25</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">HOLD_DE</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">JAMISON, PA</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">JAMISON, ROY</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">someagency &amp; Associates LLC</div>
                        </td>
                    </tr>
                    <tr>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_18">1/31/12</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_19">SFD3434</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_18">51669 / 34</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">HOLD_DE</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">NEWMAN, SO</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">NEWMAN, ALEX</div>
                        </td>
                        <td class="style_17" style="vertical-align: baseline;">
                            <div class="style_20">someagency &amp; Associates LLC</div>
                        </td>
                    </tr>
                </tbody>
                <tfoot>
                    <tr>
                        <td colspan="2" style="vertical-align: baseline;">
                            <div class="style_21">Total Tests:</div>
                        </td>
                        <td style="vertical-align: baseline;">
                            <div class="style_22">4</div>
                        </td>
                        <td style="vertical-align: baseline;"></td>
                        <td style="vertical-align: baseline;"></td>
                        <td style="vertical-align: baseline;"></td>
                        <td style="vertical-align: baseline;"></td>
                    </tr>
                </tfoot>
            </table>
            <table style=" width: 100%;" id="AUTOGENBOOKMARK_8507236727661888074">
                <col>
                <col>
                <col>
                <tbody>
                    <tr>
                        <td style="vertical-align: baseline;">
                            <div class="style_2">
                                <br>Feb 13, 2012 9:37 AM</div>
                        </td>
                        <td style="vertical-align: baseline;">
                            <div class="style_3">
                                <br>
                                <div style="text-align:center;">Page 1</div>
                            </div>
                        </td>
                        <td style="vertical-align: baseline;"></td>
                    </tr>
                </tbody>
            </table>
        </div>
    </body>

获取此数据:

enter image description here

到目前为止,我有这个:

foreach (var row in htmlSnippet.DocumentNode.SelectNodes("//table[@class = 'style_15']/tbody/tr"))
                {
                    foreach (var cell in row.SelectNodes("div[@class='*']"))
                    {
                        textBox1.Text = cell.InnerHtml.ToString();
                    }
                }

但是我没有回来任何东西!

这条线正在运作:

//table[@class = 'style_15']/tbody/tr

但这不会返回anthing:

("div[@class='*']"))

请让我知道我做错了什么!我需要帮助返回图像中显示的每个数据(字段名称除外)

2 个答案:

答案 0 :(得分:3)

*通常用于匹配任何元素或属性 name ,而不是任何。如果您要将div属性的所有class元素与任意值匹配,只需使用@class

foreach (var row in htmlSnippet.DocumentNode.SelectNodes("//table[@class = 'style_15']/tbody/tr/td"))
{
    foreach (var cell in row.SelectNodes("div[@class]"))
    {
        textBox1.Text = cell.InnerHtml.ToString();
    }
}

答案 1 :(得分:2)

您可能只想div[@class] - div元素具有class属性。

哦,还值得注意的是,您提供的HTML / XML示例格式不正确。我必须删除所有col元素,然后关闭br元素。也许,对于C#来说,这是一个问题...我知道它通常用于XSL ......不确定XPath。

我没有时间编写C#示例代码,但这是一个简单的XSL:

<xsl:template match="/">
  <so>
    <xsl:apply-templates select="//table[@class = 'style_15']/tbody/tr"/>
  </so>
</xsl:template>
<xsl:template match="div[@class]">
  <xsl:copy-of select="."/>
</xsl:template>

我得到了这个输出:

<so>
  <div class="style_18">1/30/12</div>
  <div class="style_19">ZZ324sdf</div>
  <div class="style_18">51446 / 75</div>
  <div class="style_20">HOLD_DE</div>
  <div class="style_20">234234</div>
  <div class="style_20">smith, john</div>
  <div class="style_20">PPPM-6P - SOME AGENCY</div>
  <div class="style_18">1/31/12</div>
  <div class="style_19">SFD3434</div>
  <div class="style_18">51668 / 17</div>
  <div class="style_20">HOLD_DE</div>
  <div class="style_20">FOY, EL</div>
  <div class="style_20">FOY, ALEX</div>
  <div class="style_20">someagency &amp; Associates LLC</div>
  <div class="style_18">1/31/12</div>
  <div class="style_19">SFD3434</div>
  <div class="style_18">51668 / 25</div>
  <div class="style_20">HOLD_DE</div>
  <div class="style_20">JAMISON, PA</div>
  <div class="style_20">JAMISON, ROY</div>
  <div class="style_20">someagency &amp; Associates LLC</div>
  <div class="style_18">1/31/12</div>
  <div class="style_19">SFD3434</div>
  <div class="style_18">51669 / 34</div>
  <div class="style_20">HOLD_DE</div>
  <div class="style_20">NEWMAN, SO</div>
  <div class="style_20">NEWMAN, ALEX</div>
  <div class="style_20">someagency &amp; Associates LLC</div>
</so>

这只是一个中间输出,表明XPath工作正常。

希望这有帮助。