我正在尝试学习一些基本的抓取工作,感谢这个网站,我已经能够学到很多新东西,但现在我遇到了这个问题......这是我正在使用的代码:
var web = new HtmlWeb();
var doc = web.Load("url");
var nodes = doc.DocumentNode.SelectNodes("//*[@id='hotellist_inner']/div");
StreamWriter output = new StreamWriter("out.txt");
if (nodes != null)
{
foreach (HtmlNode item in nodes)
{
if (item != null && item.Attributes["data-recommended"] != null)
{
string line = "";
var nome = item.SelectSingleNode(".//h3/a").InnerText;
var rating = item.SelectSingleNode(".//span[@class='rating']").InnerText;
var price = item.SelectSingleNode("./div[2]/div[3]/div[2]/table/tbody/tr/td[4]/div/strong[1]");
var discount = item.SelectSingleNode("./div[2]/div[3]/div[2]/table/tbody/tr/td[4]/div/div[1]");
line = line + nome + "," + rating + "," + price + "," + discount;
Console.WriteLine(line);
output.WriteLine(line);
}
}
}
前两项(名称和评级)一切正常,但在价格和折扣方面,我得到空的结果。我已经使用chrome scraper分析了页面(这里是link),它可以使用我使用过的xpath轻松获得结果。我不明白我做错了什么。 任何帮助,将不胜感激! :d
答案 0 :(得分:0)
快速查看您尝试抓取的网页后,并非所有item
都有价格和折扣信息。您需要正确处理此案例以避免异常,例如在获取null
之前检查InnerText
。您稍微更改过的代码可以获得价格和折扣信息:
if (item != null && item.Attributes["data-recommended"] != null)
{
string line = "";
var nome = item.SelectSingleNode(".//h3/a").InnerText;
var rating = item.SelectSingleNode(".//span[@class='rating']").InnerText;
var price = item.SelectSingleNode("./div[2]/div[3]/div[2]/table/tbody/tr/td[4]/div/strong[1]");
var discount = item.SelectSingleNode("./div[2]/div[3]/div[2]/table/tbody/tr/td[4]/div/div[1]");
//set priceString to empty string if price is null, else set it to price.InnerText
var priceString = price == null ? "" : price.InnerText;
//do similar step for discountString
var discountString = discount == null ? "" : discount.InnerText;
line = line + nome + "," + rating + "," + priceString + "," + discountString;
Console.WriteLine(line);
output.WriteLine(line);
}