Question

我的HTML看起来像这样：

printf "%s\n" tmpBLAH/foo_*_bar.tar.gz | sed 's/.*foo_/@/; s/_bar.*//' | date -f -

这就是我的尝试：

        <div id="footer">
            <div id="footertext">
                <p> 
                    Copyright &copy; FUCHS Online Ltd, 2013. All Rights Reserved.
                </p>
             </div>
        </div>

返回“HtmlAgilityPack.HtmlNodeCollection”类型的对象。我如何获得此文本值？

Answer 1

您需要一个节点的值。因此，最好使用SelectSingleNode方法。

HtmlWeb web = new HtmlWeb();
var doc = web.Load("http://www.fuchsonline.com");
var link = doc.DocumentNode.SelectSingleNode("//div[@id='footertext']/p");

string rawText = link.InnerText.Trim();
string decodedText = HttpUtility.HtmlDecode(text); // or WebUtility

return decodedText;

此外，您可能需要解码html实体©。

Answer 2

以下是您可以做的事情：

string html = @"
    <div id='footer'>
        <div id='footertext'>
            <p>
                Copyright &copy; FUCHS Online Ltd, 2013. All Rights Reserved.
            </p>
         </div>
    </div>";

//in my example I am not use HtmlWeb because I am working with the piece of html you provided. You will continue to you HtmlWeb and access the url...
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);

var texts = htmlDoc.DocumentNode.SelectNodes("//*[@id='footertext']").Select(n => n.InnerText.Trim());

foreach (var text in texts)
{
    Console.WriteLine(text);
}

输出：

here

Answer 3

public string getvalue()
{
    HtmlWeb web = new HtmlWeb();
    HtmlAgilityPack.HtmlDocument doc =web.Load("www.fuchsonline.com");
    var link = doc.DocumentNode.SelectNodes("//div[@id='footertext']");
    return link.InnerText.ToString();
}

如何使用HtmlAgilityPack获取单个节点的内部文本

3 个答案: