无法使用html-agility-pack从HTML代码中获取值

时间:2012-09-16 18:55:25

标签: c# web-scraping html-agility-pack

我想从HTML代码中获取值,并且我使用此C#代码使用 HtmlAgilityPack 从此HTML代码中获取值。

我只想要地址和电话号码

<div class="company-info">
    <div id="o-company" class="edit-overlay-section" style="padding-top:5px; width: 400px;">
        <a href="http://www.manta.com/c/mm23df2/us-cellular" class="company-name">
            <h1 class="profile-company_name" itemprop="name">US Cellular</h1>
        </a>
    </div>      
    <div class="addr addr-co-header-gamma" itemprop="address"itemscope=""itemtype="http://schema.org/PostalAddress">
        <em>United States Cellular Corporation</em>
        <div class="company-address">   
        <div itemprop="streetAddress">2401 12th Avenue NW # 104B</div>
            <span class="addressLocality" itemprop="addressLocality">Ardmore</span>,
            <span class="addressRegion" itemprop="addressRegion">OK</span>      
            <span class="addresspostalCode" itemprop="postalCode">73401-1471</span>
        </div>
        <dl class="phone_info"><dt>Phone:</dt>
        <dd class="tel" itemprop="telephone">(580) 490-3333</dd>
...

C#代码:

private HtmlDocument ParseLink(string URL)
{ 
    HtmlDocument hDoc = new HtmlDocument();
    try
    {
        WebClient wClient = new WebClient();

        byte[] bData = wClient.DownloadData(pageurl);

        hDoc.LoadHtml(ASCIIEncoding.ASCII.GetString(bData));
        Response.Write("<table><tr><td>");

        foreach (HtmlNode hNode in hDoc.DocumentNode.SelectNodes("//div[@itemprop='company-address']"))
        {
            Response.Write(hNode.InnerText.ToString());
        }
        Response.Write("</tr></td><td>");

        foreach (HtmlNode hNode in hDoc.DocumentNode.SelectNodes("//span[@itemprop='addressLocality']"))
        {

            Response.Write(hNode.InnerText.ToString());
        }
        Response.Write("</tr></td><td>");   

        foreach (HtmlNode hNode in hDoc.DocumentNode.SelectNodes("//span[@itemprop='addressRegion']"))
        {
            Response.Write(hNode.InnerText.ToString());
        }

        Response.Write("</tr></td><td>");

        foreach (HtmlNode hNode in hDoc.DocumentNode.SelectNodes("//span[@itemprop='postalCode']"))
        {
            Response.Write(hNode.InnerText.ToString());
        }

        Response.Write("</tr></td><td>"); 

        foreach (HtmlNode hNode in hDoc.DocumentNode.SelectNodes("//dd[@itemprop='telephone']"))
        {
            Response.Write(hNode.InnerText.ToString());
        }
        Response.Write("</td>");
        Response.Write("</tr></table>");

    }
    catch (Exception ex)
    {
        Response.Write(ex.Message);
        hDoc.LoadHtml("");
    }

    return hDoc;
}

但是当编译这段代码时我得到了这个错误:

"Object reference not set to an instance of an object"

有没有人可以帮助我?谢谢。

1 个答案:

答案 0 :(得分:0)

您需要提供有关您获得的异常的更多信息(例如抛出异常的行),但是......

如果找不到与 XPath 表达式匹配的项目,SelectNodes方法将返回null,这意味着您必须检查返回的值是否为{{1}在迭代节点之前。类似的东西:

null