选择div失败

时间:2013-04-16 07:37:26

标签: asp.net-mvc-4 html-agility-pack

我正在尝试解析div class="base shortstory

中的信息
 <div id="dle-content">
   <div class="base shortstory">
     <h3 class="btl"><a href="http://someurl.com/htc-jetstream.html">HTC Jetstream</a></h3>
   </div>
   <div class="base shortstory">
     <h3 class="btl"><a href="http://someurl.com/samsung.html">Samsung S4</a></h3>
   </div>
   <div class="base shortstory">
     <h3 class="btl"><a href="http://someurl.com/dell.html">Dell Streak</a></h3>
   </div>
 </div> 

这是代码

        const string url = "http://someurl.com/catalogue";
        const string rootUrl = "http://someurl.com";
        HtmlWeb hw = new HtmlWeb();
        HtmlDocument doc = hw.Load(url);
        int dealsCount = 0;
        HtmlNode root = doc.DocumentNode.SelectSingleNode("//div[@id='dle-content']");
        int i = 1;
        //this is for the default page
        while (i<=10)
        {
            try
            {
                string node= String.Format("//div[{0}]", i);
                var link =
                    doc.DocumentNode.SelectSingleNode(node);
                var href = link.SelectSingleNode("//div[@class='mlink']//span[@class='argmore']//a[@href]").Attributes["href"].Value;
                string title = link.SelectSingleNode("//h3[@class='btl']//a[@href]").InnerText.Trim();

                string description = link.SelectSingleNode("//div[@class='maincont']//div[1]").InnerText.Replace("\n", " ").Replace("\r", "").Replace("\t", "").Trim();
                description = RemoveHTMLComments(description);

                var imageURL = link.SelectSingleNode("//div[@class='maincont']//div[1]//a//img").Attributes["src"].Value;

                var price = link.SelectSingleNode("//div[@class='mlink']//span[3]//font").InnerText.Trim();
                price = Regex.Match(price, @"\d+").Value;

                var partnerdealID = href;

                //no information 

                var isActivesStr = link.SelectSingleNode("//div[@class='mlink']//span[2]/font").InnerText.Trim();
                bool isActive;
                if (isActivesStr.Contains("Нет в наличии"))
                {
                    isActive = false;
                }
                else
                {
                    isActive = true;
                }
                var dealUrl = href; //requires login - show the page itself

            }
            catch (Exception)
            {
            }
            i += 1;
        }

但是在循环之后,所选节点仍然是第一个节点。我做错了什么?

1 个答案:

答案 0 :(得分:2)

所有XPATH表达式都以'//'开头,这意味着“从文档的根开始并递归搜索”。所以当你这样做时:

link.SelectSingleNode("//div[@class='mlink']//span[@class='argmore']//a[@href]")

您不会从link开始,而是从文档的根开始。你可能想要这样做:

link.SelectSingleNode("div[@class='mlink']...etc...")

相当于

link.SelectSingleNode("./div[@class='mlink']...etc...")

''表示当前节点。 '/'表示只搜索直接的孩子,而不是递归。