如何使用htmlagility获取标签及其文本

时间:2017-07-06 06:58:45

标签: c# html-agility-pack

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
            string html = null;
            html =
            "<body> " +
                "<p class=\"hang12\">“What is Lorem Ipsum?” <i>Lorem Ipsum is simply dummy text</i> Lorem Ipsum has been the</p>" +
                "<p class=\"hang12\">when an unknown printer took a galley of type <i>It has survived not only five centuries,</i>.</p>" +
                "<p class=\"hang12\">but also the  <i>remaining essentially </i> </p>" +
                "<p class=\"hang12\">with the release of Letraset sheets containing Lorem Ipsum passages, <i>and more recently with desktop</i>. 1944.</p>" +
                "</body>";

            doc.LoadHtml(html);
            foreach (var item in doc.DocumentNode.Descendants())
            {
                chNodes(item);
            }

public void chNodes(HtmlAgilityPack.HtmlNode node)
        {
            try
            {
                if (node.HasChildNodes)
                {
                    foreach (var item in node.ChildNodes)
                    {
                        chNodes(item);
                    }
                }
                else
                {
                    Console.WriteLine("************");
                    Console.WriteLine(node.Line);
                    Console.WriteLine(node.LinePosition);
                    Console.WriteLine("************");
                }

            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.StackTrace);
                throw ex;
            }
        }

上面的代码获取了找到的开始标记的第一个位置。但我无法得到结束标签的位置。我怎么解决呢?我需要这些值来突出显示webbrowser控件中的文本。谢谢。

1 个答案:

答案 0 :(得分:0)

你可以使用以下代码试试这个

foreach (var item in doc.DocumentNode.SelectNodes("//p[@class='hang12']"))
{ 
     item.innerText;
     item.innerHtml; 
}