Html Agility Pack表中的空值

时间:2014-09-02 22:26:58

标签: c# html web-scraping html-agility-pack

我正在尝试学习一些基本的抓取工作,感谢这个网站,我已经能够学到很多新东西,但现在我遇到了这个问题......这是我正在使用的代码:

var web = new HtmlWeb();
var doc = web.Load("url");
var nodes = doc.DocumentNode.SelectNodes("//*[@id='hotellist_inner']/div");
StreamWriter output = new StreamWriter("out.txt");

if (nodes != null)
{
    foreach (HtmlNode item in nodes)
    {
        if (item != null && item.Attributes["data-recommended"] != null)
        {
            string line = "";
            var nome = item.SelectSingleNode(".//h3/a").InnerText;
            var rating = item.SelectSingleNode(".//span[@class='rating']").InnerText;
            var price = item.SelectSingleNode("./div[2]/div[3]/div[2]/table/tbody/tr/td[4]/div/strong[1]");
            var discount = item.SelectSingleNode("./div[2]/div[3]/div[2]/table/tbody/tr/td[4]/div/div[1]");
            line = line + nome + "," + rating + "," + price + "," + discount;
            Console.WriteLine(line);
            output.WriteLine(line);
        }
    }
}

前两项(名称和评级)一切正常,但在价格和折扣方面,我得到空的结果。我已经使用chrome scraper分析了页面(这里是link),它可以使用我使用过的xpath轻松获得结果。我不明白我做错了什么。 任何帮助,将不胜感激! :d

1 个答案:

答案 0 :(得分:0)

快速查看您尝试抓取的网页后,并非所有item都有价格和折扣信息。您需要正确处理此案例以避免异常,例如在获取null之前检查InnerText。您稍微更改过的代码可以获得价格和折扣信息:

if (item != null && item.Attributes["data-recommended"] != null)
{
    string line = "";
    var nome = item.SelectSingleNode(".//h3/a").InnerText;
    var rating = item.SelectSingleNode(".//span[@class='rating']").InnerText;
    var price = item.SelectSingleNode("./div[2]/div[3]/div[2]/table/tbody/tr/td[4]/div/strong[1]");
    var discount = item.SelectSingleNode("./div[2]/div[3]/div[2]/table/tbody/tr/td[4]/div/div[1]");
    //set priceString to empty string if price is null, else set it to price.InnerText
    var priceString = price == null ? "" : price.InnerText;
    //do similar step for discountString
    var discountString = discount == null ? "" : discount.InnerText;
    line = line + nome + "," + rating + "," + priceString + "," + discountString;
    Console.WriteLine(line);
    output.WriteLine(line);
}