我想要提取的不是整个网页而只提取一个类的文字,我想要来自td class ="结果中立的文本"我不知道这段代码有什么问题:
<td class="result-neutral" xseid="xz1nBfht"><a href="/hockey/russia/khl/ska-st-petersburg-metallurg-magnitogorsk-xz1nBfht/">3 - 2 </a></td>
这是C#代码:
HtmlAgilityPack.HtmlDocument doc = new HtmlDocument();
HtmlWeb hw = new HtmlWeb();
doc = hw.Load("htt
var scoreNodes = doc.DocumentNode.Descendants("td").Where(d =>d.Attributes.Contains("class")&&d.Attributes["class"].Value.Contains("result-neutral"));
foreach (var item in scoreNodes)
{
result += item.OuterHtml + Environment.NewLine;
}
Info.Text = result;
}
答案 0 :(得分:0)
OuterHtml
返回html with start&amp;元素的结尾。不想要InnerHtml
或InnerText
?
修改强> 这段代码对我有用:
const string html = @"<html><body><table><tr><td class='result-neutral' xseid='xz1nBfht'><a href='/hockey/russia/khl/ska-st-petersburg-metallurg-magnitogorsk-xz1nBfht/'>3 - 2</a></td></tr></table></body></html>";
var doc = new HtmlDocument();
doc.LoadHtml(html);
var scoreNodes = doc.DocumentNode.Descendants("td").Where(d => d.Attributes.Contains("class") && d.Attributes["class"].Value.Contains("result-neutral"));
string result = "";
foreach (var item in scoreNodes) {
result += item.InnerText + Environment.NewLine;
}
result = result.TrimEnd(); // the result is "3-2"