查询使用HTML Agility Pack解析的网页的问题

时间:2011-09-02 13:19:20

标签: c# web-scraping html-agility-pack

我有以下源代码snipet:

<div class = "discount_tools_row">
  <div class = "discount_tools">
    <ul> 
      <li><a href = "#" class = "share-discount" rel = "nofollow"></a></li>
      <li><a href = "/deal/map/4243683"
             class = "show-location"
             title = "הראה מקום על מפה"
             data-address = "רח&#39; האצ&quot;ל 39, ראשון לציון"></a></li>
    </ul>

    <link rel = "prerender"
          href = "http:/ / www.bigdeal.co.il / ? CampaignId = 873 & sId = 10 ">
    <a class = "tavo_button"
       data-provider = "bigdeal"
       href = "http : //www.bigdeal.co.il/?CampaignId=873&sId=10"
       target="_blank"
       rel = "nofollow">תבוא!</a>
    </div>
  </div>
</div>

使用HTML Agility Pack我想获取<data-address value, link rel="prerender" href value>

我尝试了以下但得到了错误的结果:

var nodes = doc.DocumentNode.SelectNodes(
    "//div[@class=\"discount_tools\"]");
var geoNodes = nodes.Where(node => !string.IsNullOrEmpty(
    node.ChildAttributes("data-address").ToString()));
AnswerFormat ans = new AnswerFormat {
    Locations = geoNodes.Select(
        node => node.ChildAttributes("data-address").ToString()).ToList(),
    //Names = nodes.Select(node => node.Attributes["data-address"].Value).
    //ToList(),
    Details = geoNodes.Select(
        node => node.ChildAttributes("data-direct-url").ToString()).ToList()
};

我试图实现所有

< div class = "discount_tools" >

data-address
在childNode和

中的

属性

  <a class="tavo_button" data-provider="bigdeal" href=

在另一个childNode

如何改善查询?

2 个答案:

答案 0 :(得分:0)

这是我的解决方案:

        var nodes = doc.DocumentNode.SelectNodes("//div[@class=\"discount_tools\"]");
        var linksCollections = nodes.Select(node => node.Descendants("a"));

        List<string> Locations = new List<string>();
        List<string> Categories = new List<string>();
        List<string> Hrefs = new List<string>();

        foreach (var col in linksCollections)
        {
            string location, category, href;
            location = GetAtt("data-address",col);
            if (!string.IsNullOrEmpty(location))
            {
                category = GetAtt("data-kind", col);
                if (!string.IsNullOrEmpty(category))
                {
                    href = GetAtt("data-provider", "href", col);
                    if (!string.IsNullOrEmpty(href))
                    {
                        Locations.Add(location);
                        Categories.Add(category);
                        Hrefs.Add(href);
                    }
                }
            }

        }

答案 1 :(得分:0)

String dataAddressValue = doc.DocumentNode.SelectSingleNode("//div[@class='discount_tools']/ul/li/a[@class='show-location']").Attributes["data-address"].Value;

String LinkHrefValue = doc.DocumentNode.SelectSingleNode("//div[@class='discount_tools ']/link[@rel=’prerender’]").Attributes["href"].Value;