如何在HtmlAgilityPack中逐行获取数据?

时间:2015-11-27 11:48:28

标签: c# .net html-parsing html-agility-pack

这是我的1个示例数据。这些数据中有15个。 我如何获得 4 (数据类), 8,2 (数据得分), Parkk (名称)
我想逐行排列数据。我想我必须使用foreach,有人可以帮助我吗?

<div class="sr_item sr_item_new      sr_item_default                sr_property_block  sr_flex_layout         card-bigger-price      sr_item--with-value-deal    "
        data-hotelid="10274"
        data-class="4"
        data-score="8,2"
        data-recommended="1"
        data-row-number="1">
        <a class="hotel_name_link url"
        href="/hotel/nl/parkhoteladam.tr.html?aid=309654;label=booking-be-tr-JKGYPlETyQ8zXLSF_YGpswS70199808652%3Apl%3Ata%3Ap1%3Ap2%3Aac%3Aap1t1%3Aneg%3Afi%3Atikwd-21085524309%3Alp1012783%3Ali%3Adec%3Adm;sid=8b79e4c094eb1d07801d638dbebd5d45;dcid=4;checkin=2015-11-28;checkout=2015-11-29;ucfs=1;room1=A,A;srfid=ddba57556d198f7f351dfd7936afdee5e7b5d96fX1;highlight_room="
         target="_blank" 
        data-component="track" data-track="mouseenter" data-stage="1" data-hash="HMDCcKPRNHcXJEbSaTfRe"
        >
        Parkk
        </a>

2 个答案:

答案 0 :(得分:0)

这可以按预期工作:

var div = doc.DocumentNode.SelectSingleNode("//div[contains(@class, 'sr_item sr_item_new') and contains(@class, 'sr_item_default') and contains(@class, 'sr_property_block') and contains(@class, 'sr_flex_layout')and contains(@class, 'card-bigger-price')and contains(@class, 'sr_item--with-value-deal')]");
string hotelID = div.GetAttributeValue("data-hotelid", "");
string dataClass = div.GetAttributeValue("data-class", "");
string dataScore = div.GetAttributeValue("data-score", "");
string dataRecommended = div.GetAttributeValue("data-recommended", "");
string name = div.InnerText.Trim();

答案 1 :(得分:0)

以现有answer为基础。 用于获取div的xpath以&#34; // div&#34;开头。这意味着它将找到符合标准的所有dics。然后foreach循环从每个div中提取数据。您仍然必须将该数据填充到类的列表中。

    public static void Main(string[] args)
    {
        HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

        doc.Load(link);

        HtmlAgilityPack.HtmlNodeCollection divs = doc.DocumentNode.SelectNodes("//div[contains(@class, 'sr_item sr_item_new') and contains(@class, 'sr_item_default') and contains(@class, 'sr_property_block') and contains(@class, 'sr_flex_layout')and contains(@class, 'card-bigger-price')and contains(@class, 'sr_item--with-value-deal')]");
        foreach (HtmlAgilityPack.HtmlNode n in divs)
        {
            string hotelID = n.GetAttributeValue("data-hotelid", "");
            string dataClass = n.GetAttributeValue("data-class", "");
            string dataScore = n.GetAttributeValue("data-score", "");
            string dataRecommended = n.GetAttributeValue("data-recommended", "");
            string name = n.SelectSingleNode("a").InnerText;
        }
    }