如何从精确的chtml类敏捷包中获取数据

时间:2017-06-19 10:31:09

标签: c# html-agility-pack

我想要提取的不是整个网页而只提取一个类的文字,我想要来自td class ="结果中立的文本"我不知道这段代码有什么问题:

<td class="result-neutral" xseid="xz1nBfht"><a href="/hockey/russia/khl/ska-st-petersburg-metallurg-magnitogorsk-xz1nBfht/">3 - 2 </a></td>

这是C#代码:

HtmlAgilityPack.HtmlDocument doc = new HtmlDocument();
        HtmlWeb hw = new HtmlWeb();
        doc = hw.Load("htt
var scoreNodes = doc.DocumentNode.Descendants("td").Where(d =>d.Attributes.Contains("class")&&d.Attributes["class"].Value.Contains("result-neutral"));

        foreach (var item in scoreNodes)
        {
            result += item.OuterHtml + Environment.NewLine;
        }
        Info.Text = result;

    }

1 个答案:

答案 0 :(得分:0)

OuterHtml返回html with start&amp;元素的结尾。不想要InnerHtmlInnerText

修改 这段代码对我有用:

const string html = @"<html><body><table><tr><td class='result-neutral' xseid='xz1nBfht'><a href='/hockey/russia/khl/ska-st-petersburg-metallurg-magnitogorsk-xz1nBfht/'>3 - 2</a></td></tr></table></body></html>";
var doc = new HtmlDocument();
doc.LoadHtml(html);

var scoreNodes = doc.DocumentNode.Descendants("td").Where(d => d.Attributes.Contains("class") && d.Attributes["class"].Value.Contains("result-neutral"));

string result = "";
foreach (var item in scoreNodes) {
    result += item.InnerText + Environment.NewLine;
}
result = result.TrimEnd(); // the result is "3-2"