c#htmlagilitypack获取表值

时间:2013-12-18 17:58:44

标签: c# sql-server html-agility-pack

我有一个网页,需要解析要存储在sqlserver db中的广告值。我曾尝试使用HTMLagility包。

HtmlDocument hdoc = new HtmlDocument();
hdoc.LoadHtml(HTML);
var cols = hdoc.DocumentNode.SelectNodes("//table[@id='results']//tr//th//td");
for (int i = 0; i < cols.Count; i = i + 2)
{
            DataRow dr = dt.NewRow();
            string name = cols[i].InnerText.Trim(); 
}

这就是我的html外观

<table id="results">
    <tr>
        <th style="white-space: nowrap;">
            ID
        </th>
        <th style="text-align: left;">
            Entity Name /<br>
            Type
        </th>
        <th style="white-space: nowrap;">
            Registered<br>
            Effective Date
        </th>
        <th>
            Status /<br>
            Status Date
        </th>
    </tr>
    <tr class="exactMatch" valign="top">
        <td class="entityID">
            123456
        </td>
        <td class="nameAndTypeDescription">
            <span class="name"><a href="test.aspx?entityID=123456&hash=2055339395&orgTypes=01%2c99">
                NAME1 COMPANY </a></span>
            <br />
            <span class="typeDescription">55 - TRadeUnion Company </span>
        </td>
        <td class="registeredEffectiveDate">
            01/12/1912
        </td>
        <td class="statusDescriptionAndStatusDate">
            <span class="statusDescription">Exists Now </span>
            <br>
            <span class="statusDate">12/14/1943</span>
        </td>
    </tr>
    <tr class="exactMatch" valign="top">
        <td class="entityID">
            A23456
        </td>
        <td class="nameAndTypeDescription">
            <span class="name"><a href="test.aspx?entityID=A23456&hash=615278445&orgTypes=01%2c99">
                TESTA, INC. </a></span>
            <br />
            <span class="typeDescription">09 - Domestic Corporation </span>
        </td>
        <td class="registeredEffectiveDate">
            04/29/1926
        </td>
        <td class="statusDescriptionAndStatusDate">
            <span class="statusDescription">Dissolved Company </span>
            <br>
            <span class="statusDate">06/16/1998</span>
        </td>
    </tr>
</table>

我需要插入entityID,名称,超链接,类型描述,registeredeffectivedate,状态描述,状态日期。现在他们都在一行打印,我知道如何解析它。请帮忙。

由于 MR

1 个答案:

答案 0 :(得分:1)

TD没有嵌套在TH&#39下。

试试这个:SelectNodes("//table[@id='results']/tr/td");