用于替换标记之间的额外字符的正则表达

时间:2014-11-11 12:05:40

标签: c# .net regex regex-lookarounds

假设我有一些示例文本如下:

;&nbsp; </span>&lt;year&gt;<o:p></o:p>
</span>&lt;</span><span style=3D'font-size:9.0pt;mso-bidi-font-family:Arial'>manufacturer&gt;</span><span                  style=3D'mso-bidi-font-family:Arial'>
</span>&lt;model&gt;<o:p>
</span>&lt;<span class=3DSpellE>serial_number</span>&gt;<o:p>
</span>&lt;<span class=3DSpellE>accessories_value</span>&gt;<o:p></o:p></span>
</span>&lt;<span class=3DSpellE>accessories_list</span>&gt;
p;&nbsp; </span>&lt;<span class=3DSpellE>worldwide_yn</span>&gt;
</span>&lt;</b><span class=3DSpellE><span style=3D'mso-no-proof:yes'>pet_name</span></span><span style=3D'mso-   no-proof:yes'>&gt;</span><o:p></o:p></p>

我希望找到并替换以下每种情况:

&lt; any_html_tags markers_text any_html_tags &gt; 

这里:

html_tags:可选,可以是开启和关闭类型,数字可以是零到多次,这里可能有任何HTML标记。

markers_text:可以是两种格式之一,可以是xxxxx(任何字符数)或xxxx_xxxxxx(文本可以是任意长度)。

就像我希望能够在示例文件中找到以下文本:

1) &lt;year&gt;
2) &lt;</span><span style=3D'font-size:9.0pt;mso-bidi-font-family:Arial'>manufacturer&gt;
3) &lt;model&gt;
4) &lt;<span class=3DSpellE>serial_number</span>&gt;
5) &lt;<span class=3DSpellE>accessories_value</span>&gt;
6) &lt;<span class=3DSpellE>accessories_list</span>&gt;
7) &lt;<span class=3DSpellE>worldwide_yn</span>&gt;
8) &lt;</b><span class=3DSpellE><span style=3D'mso-no-proof:yes'>pet_name</span></span><span style=3D'mso-no-proof:yes'>&gt;

并将其替换为相应的项目,例如:

1) &lt;year&gt;
2) </span><span style=3D'font-size:9.0pt;mso-bidi-font-family:Arial'>&lt;manufacturer&gt;
3) &lt;model&gt;
4) <span class=3DSpellE></span>&lt;serial_number&gt;
5) <span class=3DSpellE></span>&lt;accessories_value&gt;
6) <span class=3DSpellE></span>&lt;accessories_list&gt;
7) <span class=3DSpellE></span>&lt;worldwide_yn&gt;
8) </b><span class=3DSpellE><span style=3D'mso-no-proof:yes'></span></span><span style=3D'mso-no-proof:yes'>&lt;pet_name&gt;

所以基本上我想要&amp; lt;和&amp; gt;除了MARKER_TEXT之外的每个标签都被移除并且在&amp; lt;之前 我正在使用c#Regex方法。

你能否建议正确的正则表达来实现它?

最终样本结果应如下所示:

;&nbsp; </span>&lt;year&gt;<o:p></o:p>
</span></span><span style=3D'font-size:9.0pt;mso-bidi-font-family:Arial'>&lt;manufacturer&gt;</span><span     style=3D'mso-bidi-font-family:Arial'>
 </span>&lt;model&gt;<o:p>
 </span><span class=3DSpellE></span>&lt;serial_number&gt;<o:p>
 </span><span class=3DSpellE></span>&lt;accessories_value&gt;<o:p></o:p></span>
  </span><span class=3DSpellE></span>&lt;accessories_list&gt;
 p;&nbsp; </span><span class=3DSpellE></span>&lt;worldwide_yn&gt;
</b><span class=3DSpellE><span style=3D'mso-no-proof:yes'></span></span><span style=3D'mso-no-  proof:yes'>&lt;pet_name&gt;

1 个答案:

答案 0 :(得分:1)

这个搜索/替换可能就是你要找的东西:

图案:

&lt;((?:</?span[^>]*>)*)(\w+)((?:</?span[^>]*>)*)&gt;

替换:

$1&lt;$2&gt;$3

online demo (请参阅“上下文标签”)