如何使用HtmlAgilityPack获取单个节点的内部文本

时间:2016-07-03 10:37:19

标签: c# html-agility-pack

我的HTML看起来像这样:

printf "%s\n" tmpBLAH/foo_*_bar.tar.gz | sed 's/.*foo_/@/; s/_bar.*//' | date -f -

我想从标记中获取此文本并将其作为字符串存储在我的C#代码中:“Copyright©FUCHS Online Ltd,2013。All Rights”。

这就是我的尝试:

        <div id="footer">
            <div id="footertext">
                <p> 
                    Copyright &copy; FUCHS Online Ltd, 2013. All Rights Reserved.
                </p>
             </div>
        </div>

返回“HtmlAgilityPack.HtmlNodeCollection”类型的对象。我如何获得此文本值?

3 个答案:

答案 0 :(得分:2)

您需要一个节点的值。因此,最好使用SelectSingleNode方法。

HtmlWeb web = new HtmlWeb();
var doc = web.Load("http://www.fuchsonline.com");
var link = doc.DocumentNode.SelectSingleNode("//div[@id='footertext']/p");

string rawText = link.InnerText.Trim();
string decodedText = HttpUtility.HtmlDecode(text); // or WebUtility

return decodedText;

此外,您可能需要解码html实体&copy;

答案 1 :(得分:0)

以下是您可以做的事情:

string html = @"
    <div id='footer'>
        <div id='footertext'>
            <p>
                Copyright &copy; FUCHS Online Ltd, 2013. All Rights Reserved.
            </p>
         </div>
    </div>";

//in my example I am not use HtmlWeb because I am working with the piece of html you provided. You will continue to you HtmlWeb and access the url...
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);

var texts = htmlDoc.DocumentNode.SelectNodes("//*[@id='footertext']").Select(n => n.InnerText.Trim());

foreach (var text in texts)
{
    Console.WriteLine(text);
}

输出:

here

答案 2 :(得分:0)

public string getvalue()
{
    HtmlWeb web = new HtmlWeb();
    HtmlAgilityPack.HtmlDocument doc =web.Load("www.fuchsonline.com");
    var link = doc.DocumentNode.SelectNodes("//div[@id='footertext']");
    return link.InnerText.ToString();
}