我正试图遍历一个HTML文档:
<body class="style_0">
<div>
<div class="style_1">Pending Test List</div>
<table style=" width: 100%;" id="AUTOGENBOOKMARK_4365445353431356880">
<col>
<col>
<tbody>
<tr>
<td style="vertical-align: baseline;">
<div class="style_4">Pending Test List</div>
</td>
<td style="vertical-align: baseline;">
<div class="style_5">SOME AGENCY Laboratories, Inc.</div>
</td>
</tr>
</tbody>
</table>
<table class="style_6" style=" width: 4.531in;" id="AUTOGENBOOKMARK_5083738604442918131">
<col style=" width: 1in;">
<col class="style_7" style=" width: 0.75in;">
<col class="style_8" style=" width: 0.6in;">
<col style=" width: 0.75in;">
<col style=" width: 2.375in;">
<tbody>
<tr class="style_9" style=" height: 0.5in;">
<td style="vertical-align: middle;">
<div class="style_10">Report Range:</div>
</td>
<td style="vertical-align: middle;">
<div class="style_11">01/01/2012</div>
</td>
<td style="vertical-align: middle;">
<div class="style_12">through</div>
</td>
<td style="vertical-align: middle;">
<div class="style_13">01/31/2012</div>
</td>
<td style="vertical-align: middle;">
<div class="style_14">(by Date Entered)</div>
</td>
</tr>
</tbody>
</table>
<table class="style_15" style=" width: 100%;" id="AUTOGENBOOKMARK_7602283385844673591" iid="/526
(QuRs78576248:0)">
<col style=" width: 0.75in;">
<col style=" width: 1.25in;">
<col style=" width: 1in;">
<col style=" width: 1.5in;">
<col style=" width: 1.5in;">
<col style=" width: 1.5in;">
<col>
<thead>
<tr>
<td colspan="4" style="vertical-align: baseline;"></td>
<td style="vertical-align: baseline;"></td>
<td style="vertical-align: baseline;"></td>
<td style="vertical-align: baseline;"></td>
</tr>
<tr>
<td style="vertical-align: baseline;">
<div class="style_16">Entered</div>
</td>
<td style="vertical-align: baseline;">
<div class="style_16">Spec. ID</div>
</td>
<td style="vertical-align: baseline;">
<div class="style_16">Batch/Pos.</div>
</td>
<td style="vertical-align: baseline;">
<div class="style_16">Test</div>
</td>
<td style="vertical-align: baseline;">
<div class="style_16">Client ID</div>
</td>
<td style="vertical-align: baseline;">
<div class="style_16">Client Name</div>
</td>
<td style="vertical-align: baseline;">
<div class="style_16">Agency</div>
</td>
</tr>
</thead>
<tbody>
<tr>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_18">1/30/12</div>
</td>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_19">ZZ324sdf</div>
</td>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_18">51446 / 75</div>
</td>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_20">HOLD_DE</div>
</td>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_20">234234</div>
</td>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_20">smith, john</div>
</td>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_20">PPPM-6P - SOME AGENCY</div>
</td>
</tr>
<tr>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_18">1/31/12</div>
</td>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_19">SFD3434</div>
</td>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_18">51668 / 17</div>
</td>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_20">HOLD_DE</div>
</td>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_20">FOY, EL</div>
</td>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_20">FOY, ALEX</div>
</td>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_20">someagency & Associates LLC</div>
</td>
</tr>
<tr>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_18">1/31/12</div>
</td>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_19">SFD3434</div>
</td>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_18">51668 / 25</div>
</td>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_20">HOLD_DE</div>
</td>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_20">JAMISON, PA</div>
</td>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_20">JAMISON, ROY</div>
</td>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_20">someagency & Associates LLC</div>
</td>
</tr>
<tr>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_18">1/31/12</div>
</td>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_19">SFD3434</div>
</td>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_18">51669 / 34</div>
</td>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_20">HOLD_DE</div>
</td>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_20">NEWMAN, SO</div>
</td>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_20">NEWMAN, ALEX</div>
</td>
<td class="style_17" style="vertical-align: baseline;">
<div class="style_20">someagency & Associates LLC</div>
</td>
</tr>
</tbody>
<tfoot>
<tr>
<td colspan="2" style="vertical-align: baseline;">
<div class="style_21">Total Tests:</div>
</td>
<td style="vertical-align: baseline;">
<div class="style_22">4</div>
</td>
<td style="vertical-align: baseline;"></td>
<td style="vertical-align: baseline;"></td>
<td style="vertical-align: baseline;"></td>
<td style="vertical-align: baseline;"></td>
</tr>
</tfoot>
</table>
<table style=" width: 100%;" id="AUTOGENBOOKMARK_8507236727661888074">
<col>
<col>
<col>
<tbody>
<tr>
<td style="vertical-align: baseline;">
<div class="style_2">
<br>Feb 13, 2012 9:37 AM</div>
</td>
<td style="vertical-align: baseline;">
<div class="style_3">
<br>
<div style="text-align:center;">Page 1</div>
</div>
</td>
<td style="vertical-align: baseline;"></td>
</tr>
</tbody>
</table>
</div>
</body>
获取此数据:
到目前为止,我有这个:
foreach (var row in htmlSnippet.DocumentNode.SelectNodes("//table[@class = 'style_15']/tbody/tr"))
{
foreach (var cell in row.SelectNodes("div[@class='*']"))
{
textBox1.Text = cell.InnerHtml.ToString();
}
}
但是我没有回来任何东西!
这条线正在运作:
//table[@class = 'style_15']/tbody/tr
但这不会返回anthing:
("div[@class='*']"))
请让我知道我做错了什么!我需要帮助返回图像中显示的每个数据(字段名称除外)
答案 0 :(得分:3)
*
通常用于匹配任何元素或属性 name ,而不是任何值。如果您要将div
属性的所有class
元素与任意值匹配,只需使用@class
。
foreach (var row in htmlSnippet.DocumentNode.SelectNodes("//table[@class = 'style_15']/tbody/tr/td"))
{
foreach (var cell in row.SelectNodes("div[@class]"))
{
textBox1.Text = cell.InnerHtml.ToString();
}
}
答案 1 :(得分:2)
您可能只想div[@class]
- div
元素具有class
属性。
哦,还值得注意的是,您提供的HTML / XML示例格式不正确。我必须删除所有col
元素,然后关闭br
元素。也许,对于C#来说,这是一个问题...我知道它通常用于XSL ......不确定XPath。
我没有时间编写C#示例代码,但这是一个简单的XSL:
<xsl:template match="/">
<so>
<xsl:apply-templates select="//table[@class = 'style_15']/tbody/tr"/>
</so>
</xsl:template>
<xsl:template match="div[@class]">
<xsl:copy-of select="."/>
</xsl:template>
我得到了这个输出:
<so>
<div class="style_18">1/30/12</div>
<div class="style_19">ZZ324sdf</div>
<div class="style_18">51446 / 75</div>
<div class="style_20">HOLD_DE</div>
<div class="style_20">234234</div>
<div class="style_20">smith, john</div>
<div class="style_20">PPPM-6P - SOME AGENCY</div>
<div class="style_18">1/31/12</div>
<div class="style_19">SFD3434</div>
<div class="style_18">51668 / 17</div>
<div class="style_20">HOLD_DE</div>
<div class="style_20">FOY, EL</div>
<div class="style_20">FOY, ALEX</div>
<div class="style_20">someagency & Associates LLC</div>
<div class="style_18">1/31/12</div>
<div class="style_19">SFD3434</div>
<div class="style_18">51668 / 25</div>
<div class="style_20">HOLD_DE</div>
<div class="style_20">JAMISON, PA</div>
<div class="style_20">JAMISON, ROY</div>
<div class="style_20">someagency & Associates LLC</div>
<div class="style_18">1/31/12</div>
<div class="style_19">SFD3434</div>
<div class="style_18">51669 / 34</div>
<div class="style_20">HOLD_DE</div>
<div class="style_20">NEWMAN, SO</div>
<div class="style_20">NEWMAN, ALEX</div>
<div class="style_20">someagency & Associates LLC</div>
</so>
这只是一个中间输出,表明XPath工作正常。
希望这有帮助。