使用XPath解析HTML以下类别

时间:2014-04-16 14:24:41

标签: c# html parsing xpath

我有以下HTML结构,每个tr标签彼此分开,所以当我尝试使用XPATH进行解析时,它应该只有一个类别有2个子项,但是我的代码下面会选择所有4个子项分为1类,因此每个类别有4个子项而不是2个。

<table class="available">
   <tbody>
      <tr>
         <td class="catname" colspan="2">
            <span>Category 1</span>
         </td>
      </tr>
      <tr>
         <td rowspan="2" colspan="1" class="itemdetail">
            <div class="subname">
               SubItem1-1
            </div>
         </td>
         <td class="precioseleccion desgloseth">
            <div class="preprice">
               <strong class="price">39.99 €</strong>
            </div>
         </td>
      </tr>
      <tr>
         <td rowspan="2" colspan="1" class="itemdetail">
            <div class="subname">
               SubItem1-2
            </div>
         </td>
         <td class="precioseleccion desgloseth">
            <div class="preprice">
               <strong class="price">49.99 €</strong>
            </div>
         </td>
      </tr>
      <tr>
         <td class="catname" colspan="2">
            <span>Category 2</span>
         </td>
      </tr>
      <tr>
         <td rowspan="2" colspan="1" class="itemdetail">
            <div class="subname">
               SubItem2-1
            </div>
         </td>
         <td class="precioseleccion desgloseth">
            <div class="preprice">
               <strong class="price">59.99 €</strong>
            </div>
         </td>
      </tr>
      <tr>
         <td rowspan="2" colspan="1" class="itemdetail">
            <div class="subname">
               SubItem2-2
            </div>
         </td>
         <td class="precioseleccion desgloseth">
            <div class="tooltip3">
               <strong class="price">69.99 €</strong>
            </div>
         </td>
      </tr>
    </tbody>    
</table>
var doc = new HtmlDocument(); // with HTML Agility pack
            doc.LoadHtml(uricontent);

            var rooms = doc.DocumentNode
           .SelectNodes("//table[@class='available']//td[@class='catname']")
           .Select(r => new
           {
               Type= r.InnerText.CleanInnerText(),

               SubTypes= r.SelectNodes("../..//tr//td[@class='itemdetail']//div[@class='subname']")

                            .Select(s => new
                            {
                                SubType= s.InnerText.CleanInnerText(),
                                Price =  
                                    s.SelectSingleNode(".//parent::td/following-sibling::td[@class='allprice']//div[@class='preprice']//strong[@class='price']")
                                        .InnerText.CleanInnerText()
                            }).ToArray()

           }).ToArray();

1 个答案:

答案 0 :(得分:0)

如果我正确理解您的问题,请选择您想要的所有类别//tr[td[@class='catname']],并选择您想要的子项following-sibling::tr/td[div[@class='subname']]