正则表达式模式删除单词格式

时间:2012-11-15 15:13:59

标签: c# html regex

我在使用正则表达式模式替换HTML标记中的项目时遇到问题。我想要替换的值是:

<!--[if mso 9]-->
<style>
 p.MsoNormal<br />
 {margin-left:18.75pt;}<br />
</style>
<!--[endif]-->

class="MsoNormal"

我不是最擅长使用Regex的,我到目前为止提出的模式是:

  1. <!--(.*?)-->
  2. class=\"msonormal\"
  3. class=\"MsoNormal\"
  4. 第1项删除任何以<!--开头且以-->结尾的内容,但只有在不删除所有实例时才会删除。

    我认为第2和第3项根本不起作用。

    我在这里找到了一些关于模式的信息:

    is there a Way to strip all Unnecessary MS Word Formatting from FCKEditor

    我要删除的文本的完整列表如下:

        <!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
    o\:* {behavior:url(#default#VML);}
    w\:* {behavior:url(#default#VML);}
    .shape {behavior:url(#default#VML);}
    </style><![endif]-->
    <title>Blank</title>
    <style>
        <!--
    /* Font Definitions */
    @font-face
        {font-family:Helvetica;
        panose-1:2 11 6 4 2 2 2 2 2 4;}
    @font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
    @font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
    @font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
    @font-face
        {font-family:"Arial Black";
        panose-1:2 11 10 4 2 1 2 2 2 4;}
    @font-face
        {font-family:"Palatino Linotype";
        panose-1:2 4 5 2 5 5 5 3 3 4;}
    @font-face
        {font-family:"Trebuchet MS";
        panose-1:2 11 6 3 2 2 2 2 2 4;}
    @font-face
        {font-family:"Matura MT Script Capitals";
        panose-1:3 2 8 2 6 6 2 7 2 2;}
    /* Style Definitions */
    p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin-top:0in;
        margin-right:0in;
        mso-margin-bottom-alt:auto;
        margin-left:0in;
        font-size:10.0pt;
        font-family:"Helvetica","sans-serif";
        color:#FFFFCC;
        mso-believe-normal-left:yes;}
    a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
    a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
    p
        {mso-style-priority:99;
        mso-margin-top-alt:auto;
        margin-right:0in;
        mso-margin-bottom-alt:auto;
        margin-left:0in;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
    p.MsoAcetate, li.MsoAcetate, div.MsoAcetate
        {mso-style-priority:99;
        mso-style-link:"Balloon Text Char";
        margin-top:0in;
        margin-right:0in;
        mso-margin-bottom-alt:auto;
        margin-left:0in;
        font-size:8.0pt;
        font-family:"Tahoma","sans-serif";
        color:#FFFFCC;}
    p.MsoNoSpacing, li.MsoNoSpacing, div.MsoNoSpacing
        {mso-style-priority:1;
        margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
    span.BalloonTextChar
        {mso-style-name:"Balloon Text Char";
        mso-style-priority:99;
        mso-style-link:"Balloon Text";
        font-family:"Tahoma","sans-serif";
        color:#FFFFCC;}
    span.EmailStyle21
        {mso-style-type:personal;
        font-family:"Arial","sans-serif";
        color:black;}
    span.EmailStyle22
        {mso-style-type:personal;
        font-family:"Arial","sans-serif";
        color:#0F243E;}
    span.EmailStyle23
        {mso-style-type:personal;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
    span.EmailStyle24
        {mso-style-type:personal;
        font-family:"Arial","sans-serif";
        color:#0F243E;}
    span.EmailStyle25
        {mso-style-type:personal;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
    span.EmailStyle26
        {mso-style-type:personal;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
    span.EmailStyle27
        {mso-style-type:personal;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
    span.EmailStyle28
        {mso-style-type:personal;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
    span.EmailStyle29
        {mso-style-type:personal;
        font-family:"Calibri","sans-serif";
        color:windowtext;}
    span.EmailStyle30
        {mso-style-type:personal;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
    span.EmailStyle31
        {mso-style-type:personal;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
    span.EmailStyle33
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
    .MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
    @page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
    div.WordSection1
        {page:WordSection1;}
    -->
    </style>
    <!--[if mso 9]-->
    <style>
        p.MsoNormal
        {margin-left:18.75pt;}
    </style>
    <!--[endif]--><!--[if gte mso 9]>
    <o:shapedefaults v:ext="edit" spidmax="1026" />
    <![endif]--><!--[if gte mso 9]>
    
    
    <p class="MsoNormal"><span style="font-size: 11pt; font-family: Calibri, sans-serif; color: #1f497d;"><o:p>&nbsp;</o:p></span></p>
    <p class="MsoNormal"><span style="font-size: 11pt; font-family: Calibri, sans-serif; color: #1f497d;"><o:p>&nbsp;</o:p></span></p>
    <p class="MsoNormal"><o:p>&nbsp;</o:p></p>
    

2 个答案:

答案 0 :(得分:1)

通常,当您希望Regexp在每个可能的实例上运行时,您需要添加“g”运算符,这意味着全局。在C#中,您通常使用Regexp.matches()来查找每个实例,然后对它们进行操作。

至于class =“MsoNormal”,我在你的文字中找不到任何实例。你确定你正在寻找正确的模式吗?

答案 1 :(得分:0)

尝试使用此模式捕获第一组项目:

\<\!\-\-\[i[\w\s\p{P}\p{S}]+if\]\-\-\>

关键是尽可能避免.*因为它会占用整个输入字符串,因此结束标准不匹配。

对于第2和第3,您只需要像=中那样转义\=符号。它们没问题,但是尝试以下模式来捕获其他Mso*类:

class\=\"Mso[^\"]+\"