HtmlAgilityPack问题与HTML

时间:2013-07-02 12:24:59

标签: c# asp.net html-agility-pack

我想获得标题,图片src和其他细节,但这是一个问题

<div class="thumb-container">
    <a class="featured" title="Spectacularly " href="http://www.site.com"></a>          
    <div rel="0" id="property_image_1181140" class="thumb">
        <a title="*Want this title*" href="*http://www.wanttogetthislink.com*">
           <img style="width: 190px; height: 127px; left: -11px; top: 0px;" alt="Spectacularly upgraded 5 bed Family Villa For Sale" src="http://c1369013.r13.cf3.rackcdn.com/1181140-1-mini.jpg">
        </a>
    </div>
<div class="description-listing">
   <div class="heading">
      <div class="type">
         <label>*5,900* sq.ft.,</label>
         <span>*Villa*</span>
         <p class="bedroom"><em>*5*</em></p>
         <p class="bathroom"><em>*6*</em></p>
      </div>
      <p class="amount">
         <label>AED</label>
         <strong>*5,120,000*</strong>
      </p>
   </div>

这是我的代码

 var allCarResults = rootNode.SelectNodes("//div[normalize-space(@class)='general-listing']");
 foreach (var carResult in allCarResults)
 {
     var dataNode = carResult.SelectSingleNode(".//div[@class='thumb']");
     var carNameNode = dataNode.SelectSingleNode(".//a");
 }

我想在**

中获取所有内容

我不知道该怎么做..

1 个答案:

答案 0 :(得分:0)

原理基本相同,你需要为每个项目编写一个XPath,并从一个公共锚点中选择它:

HtmlNode thumbContainer = doc.DocumentNode.SelectSingleNode("//div[@class='thumb-container']");

HtmlNode link = thumbContainer.SelectSingleNode("./div[@class='thumb']/a");
string linkTitle = link.Attributes["title"].Value;
string linkHref = link.Attributes["href"].Value;

HtmlNode label = thumbContainer.SelectSingleNode("./div[@class='description-listing']/div[@class='heading']/div[@class='type']/label");
string labelText = label.InnerText;

// ... Similar for other items

或者,您可以遍历每个HtmlNode及其子项,然后针对每个项目将其与您所追踪的项目列表进行匹配。