我有一个网页,需要解析要存储在sqlserver db中的广告值。我曾尝试使用HTMLagility包。
HtmlDocument hdoc = new HtmlDocument();
hdoc.LoadHtml(HTML);
var cols = hdoc.DocumentNode.SelectNodes("//table[@id='results']//tr//th//td");
for (int i = 0; i < cols.Count; i = i + 2)
{
DataRow dr = dt.NewRow();
string name = cols[i].InnerText.Trim();
}
这就是我的html外观
<table id="results">
<tr>
<th style="white-space: nowrap;">
ID
</th>
<th style="text-align: left;">
Entity Name /<br>
Type
</th>
<th style="white-space: nowrap;">
Registered<br>
Effective Date
</th>
<th>
Status /<br>
Status Date
</th>
</tr>
<tr class="exactMatch" valign="top">
<td class="entityID">
123456
</td>
<td class="nameAndTypeDescription">
<span class="name"><a href="test.aspx?entityID=123456&hash=2055339395&orgTypes=01%2c99">
NAME1 COMPANY </a></span>
<br />
<span class="typeDescription">55 - TRadeUnion Company </span>
</td>
<td class="registeredEffectiveDate">
01/12/1912
</td>
<td class="statusDescriptionAndStatusDate">
<span class="statusDescription">Exists Now </span>
<br>
<span class="statusDate">12/14/1943</span>
</td>
</tr>
<tr class="exactMatch" valign="top">
<td class="entityID">
A23456
</td>
<td class="nameAndTypeDescription">
<span class="name"><a href="test.aspx?entityID=A23456&hash=615278445&orgTypes=01%2c99">
TESTA, INC. </a></span>
<br />
<span class="typeDescription">09 - Domestic Corporation </span>
</td>
<td class="registeredEffectiveDate">
04/29/1926
</td>
<td class="statusDescriptionAndStatusDate">
<span class="statusDescription">Dissolved Company </span>
<br>
<span class="statusDate">06/16/1998</span>
</td>
</tr>
</table>
我需要插入entityID,名称,超链接,类型描述,registeredeffectivedate,状态描述,状态日期。现在他们都在一行打印,我知道如何解析它。请帮忙。
由于 MR
答案 0 :(得分:1)
TD没有嵌套在TH&#39下。
试试这个:SelectNodes("//table[@id='results']/tr/td");