从具有特定类名的元素中选择值

时间:2014-03-03 09:55:41

标签: c# linq html-agility-pack

我在解析外部html文件时遇到对象引用错误,我想这是因为并非所有选中的元素都有类名。这是我的代码:

foreach (HtmlNode link in doc.DocumentNode.Descendants("li").Where(i => i.Attributes["class"].Value == "name"))
{
    string result = link.InnerText.Trim().Replace(" ", "");
    Console.WriteLine(result);
}

如何只选择我的类名为“name”的值?

这是我正在尝试解析的html代码:

<li>
    <span class="name">
        <a href="/players/joe-bloggs.html">Joe,&nbsp;Bloggs</a>
    </span>

    <span class="country">
        <img src="/img/flags/15x15/USA.gif" alt="USA"/>
        United States
    </span>
</li>
<li>
    <span class="name">
        <a href="/players/joe-bloggs.html">Joe,&nbsp;Bloggs</a>
    </span>

    <span class="country">
        <img src="/img/flags/15x15/USA.gif" alt="USA"/>
        United States
    </span>
</li>
<li>
    <span class="name">
        <a href="/players/joe-bloggs.html">Joe,&nbsp;Bloggs</a>
    </span>

    <span class="country">
        <img src="/img/flags/15x15/RSA.gif" alt="RSA"/>
        South Africa
    </span>
</li>

1 个答案:

答案 0 :(得分:3)

您应该选择a元素而不是li元素。其span元素具有class属性。我建议你使用谓词:

var links = doc.DocumentNode.SelectNodes("//li/span[@class='name']/a");

此xpath选择span属性等于class的所有name元素,然后选择a元素。

foreach (var a in links)
    Console.WriteLine(a.InnerText);

对于您的示例HTML输出是:

Joe,&nbsp;Bloggs
Joe,&nbsp;Bloggs
Joe,&nbsp;Bloggs

旁注 - 您可以使用HttpUtility.HtmlDecode(a.InnerText)获取已解码的文字(不仅会替换&nbsp;)。


更新:解析球员

var players = from p in doc.DocumentNode.SelectNodes("//li")
              let name = p.SelectSingleNode("span[@class='name']/a")
              let country = p.SelectSingleNode("span[@class='country']")
              select new
              {
                  Name = (name == null) ? null : 
                         HttpUtility.HtmlDecode(name.InnerText.Trim()),
                  Country = (country == null) ? null :
                         HttpUtility.HtmlDecode(country.InnerText.Trim())
              };

结果:

[
  {
    Name: "Joe, Bloggs",
    Country: "United States"
  },
  {
    Name: "Joe, Bloggs",
    Country: "United States"
  },
  {
    Name: "Joe, Bloggs",
    Country: "South Africa"
  }
]