如何删除HTML字符串中的所有注释

时间:2015-09-07 12:51:31

标签: java android regex string

我有String<! --> <!comment1-->内有评论。 我想删除所有这些。

RE会是什么?

我试过了:

replaceAll("\\<!.*?\\-\\-\\>", "");

但它没有用。 我尝试循环和替换,它的工作原理,但我正在寻找一个正则表达式

我已经尝试了那个墨水中提到的html.fromHtml而且它不起作用。为此,我提出了另一个问题here

例如

下面的字符串
<style> <!-- /* Font Definitions */ @font-face  {font-family:"Cambria Math";    panose-1:2 4 5 3 5 4 6 3 2 4;} @font-face       {font-family:Calibri;   panose-1:2 15 5 2 2 2 4 3 2 4;} @font-face      {font-family:Tahoma;    panose-1:2 11 6 4 3 5 4 4 2 4;} @font-face      {font-family:Webdings;  panose-1:5 3 1 2 1 5 9 6 7 3;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal         {margin:0in;    margin-bottom:.0001pt;  font-size:11.0pt;       font-family:"Calibri",sans-serif;} a:link, span.MsoHyperlink    {mso-style-priority:99;         color:#0563C1;  text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed        {mso-style-priority:99;         color:#954F72;  text-decoration:underline;} span.EmailStyle17   {mso-style-type:personal-compose;       font-family:"Calibri",sans-serif;       color:windowtext;} .MsoChpDefault       {mso-style-type:export-only;    font-family:"Calibri",sans-serif;} @page WordSection1   {size:8.5in 11.0in;     margin:1.0in 1.0in 1.0in 1.0in;} div.WordSection1       {page:WordSection1;} --></style>

2 个答案:

答案 0 :(得分:2)

确保您收到String.replaceAll的返回值。

String html = "abcde<!--comment1-->\n"
        + "<p>abcdefghi<!--comment2-->jkl</p>\n"
        + "<p>abcdefghi<span>jklm<!--comment3-->nopq</span>rs</p>\n";

String commentsRemoved = html.replaceAll("<!--.*?-->", "");

System.out.println(html);
System.out.println(commentsRemoved);

答案 1 :(得分:0)

    Try this
 string ss = "<b><i>The tag is about to be removed</i></b>";
    Regex regex = new Regex("\\<[^\\>]*\\>");
    Response.Write(String.Format("<b>Before:</b>{0}", ss)); // HTML Text
    Response.Write("<br/>");
    ss = regex.Replace(ss, String.Empty);
    Response.Write(String.Format("<b>After:</b>{0}", ss));// Plain Text as a  OUTPUT