使用html敏捷包Scrape网站,找到课程

时间:2013-04-14 21:17:22

标签: c# html-agility-pack scrape

我想从使用HTML Agility pack的html字符串中获取som数据。

行字符串[]我试图从这里返回innerhtml获取数据:

<td class="street">Riksdagen</td>
<td class="number">&nbsp;</td>
<td class="number">&nbsp;</td>
<td class="postalcode">100 12</td>
<td class="locality">Stockholm</td>
<td class="region_code">018001</td>
<td class="county">Stockholm</td>
<td class="namnkommun">Stockholm</td>

我如何将每个类分配给正确的addressDataModel属于什么?

var row = doc.DocumentNode.SelectNodes("//*[@id='thetable']/tr");

    foreach (var rowItem in row)
    {
        var addressDataModel = new AddressDataModel
        {
            street = rowItem.FirstChild.InnerText,
            zipCodeFrom = // Next item,
            zipCodeTo = // Next item,
            zipCode = // Next item,
            locality = // Next item,
            regionCode = // Next item,
            state = // Next item,
            county = // Next item
        };
    }

2 个答案:

答案 0 :(得分:0)

你可以写这样的东西(在使用InnerText prop之前确保节点存在):

var addressDataModel = new AddressDataModel
    {
        street = rowItem.SelectSingleNode("./td[@class='street']").InnerText,
        zipCodeFrom = // Next item,
        zipCodeTo = // Next item,
        zipCode = // Next item,
        locality = // Next item,
        regionCode = // Next item,
        state = // Next item,
        county = rowItem.SelectSingleNode("./td[@class='county']").InnerText
    };

参考:http://www.w3schools.com/xpath/xpath_syntax.asp

答案 1 :(得分:0)

如果您不想使用Xpath,也可以参考:

HtmlAgilityPack.HtmlDocument htmlContent = new HtmlAgilityPack.HtmlDocument();

        htmlContent.LoadHtml(htmlCode);

        if (htmlContent.DocumentNode != null)
        {
            foreach (HtmlNode n in htmlContent.DocumentNode.Descendants("div"))
            {
                if (n.HasAttributes && n.Attributes["class"] != null)
                {
                    if (n.Attributes["class"].Value == "className")
                    {
                      // Do something
                    } 
                }                 
            }
        }