我在解析外部html文件时遇到对象引用错误,我想这是因为并非所有选中的元素都有类名。这是我的代码:
foreach (HtmlNode link in doc.DocumentNode.Descendants("li").Where(i => i.Attributes["class"].Value == "name"))
{
string result = link.InnerText.Trim().Replace(" ", "");
Console.WriteLine(result);
}
如何只选择我的类名为“name”的值?
这是我正在尝试解析的html代码:
<li>
<span class="name">
<a href="/players/joe-bloggs.html">Joe, Bloggs</a>
</span>
<span class="country">
<img src="/img/flags/15x15/USA.gif" alt="USA"/>
United States
</span>
</li>
<li>
<span class="name">
<a href="/players/joe-bloggs.html">Joe, Bloggs</a>
</span>
<span class="country">
<img src="/img/flags/15x15/USA.gif" alt="USA"/>
United States
</span>
</li>
<li>
<span class="name">
<a href="/players/joe-bloggs.html">Joe, Bloggs</a>
</span>
<span class="country">
<img src="/img/flags/15x15/RSA.gif" alt="RSA"/>
South Africa
</span>
</li>
答案 0 :(得分:3)
您应该选择a
元素而不是li
元素。其span
元素具有class
属性。我建议你使用谓词:
var links = doc.DocumentNode.SelectNodes("//li/span[@class='name']/a");
此xpath选择span
属性等于class
的所有name
元素,然后选择a
元素。
foreach (var a in links)
Console.WriteLine(a.InnerText);
对于您的示例HTML输出是:
Joe, Bloggs
Joe, Bloggs
Joe, Bloggs
旁注 - 您可以使用HttpUtility.HtmlDecode(a.InnerText)
获取已解码的文字(不仅会替换
)。
更新:解析球员
var players = from p in doc.DocumentNode.SelectNodes("//li")
let name = p.SelectSingleNode("span[@class='name']/a")
let country = p.SelectSingleNode("span[@class='country']")
select new
{
Name = (name == null) ? null :
HttpUtility.HtmlDecode(name.InnerText.Trim()),
Country = (country == null) ? null :
HttpUtility.HtmlDecode(country.InnerText.Trim())
};
结果:
[
{
Name: "Joe, Bloggs",
Country: "United States"
},
{
Name: "Joe, Bloggs",
Country: "United States"
},
{
Name: "Joe, Bloggs",
Country: "South Africa"
}
]