如何在c#中解析html中的文本

时间:2012-09-20 10:11:07

标签: c# html xml parsing

我有一个像这样的html表达式:

 "This is <h4>Some</h4> Text" + Environment.NewLine +
 "This is some more <h5>text</h5>

我只想提取文字。所以结果应该是

"This is Some Text" + Environment.NewLine +
 "This is some more text"

我该怎么做?

2 个答案:

答案 0 :(得分:8)

使用HtmlAgilityPack

string html = @"This is <h4>Some</h4> Text" + Environment.NewLine +
                "This is some more <h5>text</h5>";

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var str = doc.DocumentNode.InnerText;

答案 1 :(得分:1)

使用正则表达式进行简单:Regex.Replace(source, "<.*?>", string.Empty);