Question

我有一个如此定义的字符串：

private const String REFER_TO_BUSINESS = "<pre> (Refer to business office for guidance and explain below the circumstances for exception to policy or attach a copy of request)</pre>";

......正如你所看到的那样，＆＃34; pre＆＃34;标记以保留前缀为词条的空格。但是，我想在没有＆＃34; pre＆＃34;的情况下引用此字符串。标签。搜索＆＃34;＆lt; pre＆gt;＆＃34;会很容易和＆＃34;＆lt; / pre＆gt;＆＃34;并删除它们，但对每种HTML标记类型来说，这很快就会变得乏味。

我怎样才能在C＃中删除字符串中的所有标记，无论它们是＆＃34;＆lt; pre＆gt;＆＃34;，＆＃34;＆lt; h1＆gt;＆＃34;，＆＃34;＆lt; span＆gt;＆＃34;，＆＃34;＆lt; side＆gt;＆＃34;或其他什么？

Answer 1

尝试正则表达式替换。此模式匹配字符串中的html标记。来自here

Request URL:http://stuf.com/path/to/foo
Request Method:GET
Status Code:200 OK (from cache)
Response Headers
Accept-Ranges:bytes
Age:0
Cache-Control:no-cache, no-store, max-age=0, must-revalidate
Content-Encoding:gzip
Content-Language:fr
Content-Length:7289
Content-Type:text/html; charset=utf-8
Date:Fri, 17 Jul 2015 23:19:54 GMT
Expires:Fri, 01 Jan 2010 00:00:00 GMT
Server:nginx
Vary:Accept-Language, Cookie, Accept-Encoding
Via:1.1 varnish
X-Varnish:1867509088
X-Varnish-Cache:MISS
Request Headers
Provisional headers are shown
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.134 Safari/537.36

Answer 2

这应该做你需要做的事情：

 string stripMeOfHTML = Regex.Replace(stripMeOfHTML, @"<[^>]+>", "").Trim();

Answer 3

这有效：

// For strings that have embedded HTML tags for presentation on the form (such as "<pre>" and such), but need to be rendered free of these (such as on the PDF)
private String RemoveHTMLTags(String stringContainingHTMLTags)
{
    String regexified = Regex.Replace(stringContainingHTMLTags, "<.*?>", string.Empty);
    return regexified;
}

如何从字符串中删除任何和所有HTML标记？

3 个答案: