如何从字符串中删除任何和所有HTML标记?

时间:2015-07-17 23:17:33

标签: c# html

我有一个如此定义的字符串:

private const String REFER_TO_BUSINESS = "<pre> (Refer to business office for guidance and explain below the circumstances for exception to policy or attach a copy of request)</pre>";

......正如你所看到的那样,&#34; pre&#34;标记以保留前缀为词条的空格。但是,我想在没有&#34; pre&#34;的情况下引用此字符串。标签。搜索&#34;&lt; pre&gt;&#34;会很容易和&#34;&lt; / pre&gt;&#34;并删除它们,但对每种HTML标记类型来说,这很快就会变得乏味。

我怎样才能在C#中删除字符串中的所有标记,无论它们是&#34;&lt; pre&gt;&#34;,&#34;&lt; h1&gt;&#34;,& #34;&lt; span&gt;&#34;,&#34;&lt; side&gt;&#34;或其他什么?

3 个答案:

答案 0 :(得分:2)

尝试正则表达式替换。 此模式匹配字符串中的html标记。来自here

Request URL:http://stuf.com/path/to/foo
Request Method:GET
Status Code:200 OK (from cache)
Response Headers
Accept-Ranges:bytes
Age:0
Cache-Control:no-cache, no-store, max-age=0, must-revalidate
Content-Encoding:gzip
Content-Language:fr
Content-Length:7289
Content-Type:text/html; charset=utf-8
Date:Fri, 17 Jul 2015 23:19:54 GMT
Expires:Fri, 01 Jan 2010 00:00:00 GMT
Server:nginx
Vary:Accept-Language, Cookie, Accept-Encoding
Via:1.1 varnish
X-Varnish:1867509088
X-Varnish-Cache:MISS
Request Headers
Provisional headers are shown
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36

答案 1 :(得分:0)

这应该做你需要做的事情:

 string stripMeOfHTML = Regex.Replace(stripMeOfHTML, @"<[^>]+>", "").Trim();

答案 2 :(得分:-1)

这有效:

// For strings that have embedded HTML tags for presentation on the form (such as "<pre>" and such), but need to be rendered free of these (such as on the PDF)
private String RemoveHTMLTags(String stringContainingHTMLTags)
{
    String regexified = Regex.Replace(stringContainingHTMLTags, "<.*?>", string.Empty);
    return regexified;
}