Question

我无法弄清楚如何解析以下内容：

- 示例网页我正在尝试解析：http://www.aliexpress.com/item/-/255859073.html

- 我想要的信息：“7天”。这是发货表左栏中的处理时间。

- 点击“送货和付款”标签（位于页面下方）后，可以看到装运表。

到目前为止，我尝试选择具有不同x路径值的节点：

 HtmlAgilityPack.HtmlDocument currentHTML = new HtmlAgilityPack.HtmlDocument();
 HtmlWeb webget = new HtmlWeb();
 currentHTML = webget.Load("http://www.aliexpress.com/item/-/255859073.html");

 string processingTime = currentHTML.DocumentNode.SelectSingleNode("/html/body/div[2]/div[4]/div/div/div[2]/div/div/div[3]/div/div/div/div[2]/table/tbody/tr/td[5]").InnerText;

还有：

 string processingTime = currentHTML.DocumentNode.SelectSingleNode("//*[contains(concat( \" \", @class, \" \" ), concat( \" \", \"processing\", \" \" ))]").InnerText;

但是我收到了这个错误：

 System.NullReferenceException was unhandled
 Message=Object reference not set to an instance of an object.

我也试过他们的手机网站，但他们没有在那里显示这些信息。

知道为什么会这样，我需要做什么？

Answer 1

看起来您的XPath表达式不正确。无论您尝试解析的元素是否可以通过使用其Id属性更好地达到。我修改了XPath表达式，对于奖励我添加了一个正则表达式，允许您从文本中干净地解析日期部分。

    System.Text.RegularExpressions.Regex
        dayParseRegex = new System.Text.RegularExpressions.Regex(@"(?<days>\d)( days\))$");
    HtmlAgilityPack.HtmlDocument currentHTML = new HtmlAgilityPack.HtmlDocument();
    HtmlWeb webget = new HtmlWeb();
    currentHTML = webget.Load("http://www.aliexpress.com/item/-/255859073.html");

    //Extract node
    var handlingTimeNode = currentHTML.DocumentNode.SelectSingleNode("//*[@id=\"product-info-shipping-sub\"]");

    //Run RegEx against text
    var match = dayParseRegex.Match(handlingTimeNode.InnerText);

    //Convert the days to an integer from the resultant group
    int shippingDays = Convert.ToInt32(match.Groups["days"].Value);

谈论编码和开始付费！现在去撕掉那个网站的地狱！

为什么我不能用htmlagilitypack解析这个元素？

1 个答案: