在c#中优化正则表达式,用于在Text中搜索多行表达式

时间:2014-11-04 08:55:06

标签: c# regex winforms

我试图在文本文件中找到以下类型的表达式:

<&lt>[some text][newline][some text]<&gt;>

这里的问题是,在找到结束标记<&gt;>

之前,换行符可能很多

我尝试过遵循正则表达式

&lt;(.*?\n.*?)&gt;

它完美地找到表达式除以单行,但我还需要找到由各行划分的表达式。

我也尝试了以下表达式:

&lt;(.*?\n.*?)*&gt;

但搜索它会导致超时, 请帮帮忙?

用于搜索的示例文本:

<p class=3DMsoNormal style=3D'margin-top:12.0pt;margin-right:0cm;margin-bot=
tom:
0cm;margin-left:148.85pt;margin-bottom:.0001pt;text-indent:-148.85pt;
tab-stops:148.85pt right 16.0cm'><b style=3D'mso-bidi-font-weight:normal'><=
span
style=3D'font-family:"Calibri","sans-serif"'>RISK DETAILS<span style=3D'mso=
-tab-count:
1'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span></b><span
style=3D'font-family:"Calibri","sans-serif"'>Your home is described as
&lt;q_1&gt;<o:p></o:p></span></p>

<p class=3DMsoNormal style=3D'margin-top:0cm;margin-right:0cm;margin-bottom=
:0cm;
margin-left:148.85pt;margin-bottom:.0001pt'><span style=3D'font-family:"Cal=
ibri","sans-serif"'>The
construction of your home is &lt;q_2&gt;<o:p></o:p></span></p>

<p class=3DMsoNormal style=3D'margin-top:0cm;margin-right:0cm;margin-bottom=
:0cm;
margin-left:148.85pt;margin-bottom:.0001pt'><span style=3D'font-family:"Cal=
ibri","sans-serif"'>The
main roof material is &lt;q_3&gt;<o:p></o:p></span></p>

<p class=3DMsoNormal style=3D'margin-top:0cm;margin-right:0cm;margin-bottom=
:0cm;
margin-left:148.85pt;margin-bottom:.0001pt'><span style=3D'font-family:"Cal=
ibri","sans-serif"'>Your
home was built in &lt;q_4&gt;<o:p></o:p></span></p>

<p class=3DMsoNormal style=3D'margin-top:0cm;margin-right:0cm;margin-bottom=
:0cm;
margin-left:148.85pt;margin-bottom:.0001pt'><span style=3D'font-family:"Cal=
ibri","sans-serif"'>Your
<span class=3DGramE>home &lt;q_5&gt; double</span> keyed deadlocks to all
external doors<o:p></o:p></span></p>

<p class=3DMsoNormal style=3D'margin-top:0cm;margin-right:0cm;margin-bottom=
:0cm;
margin-left:148.85pt;margin-bottom:.0001pt'><span style=3D'font-family:"Cal=
ibri","sans-serif"'>Your
home &lt;q_6&gt; keyed locks or grilles on all windows<o:p></o:p></span></p>

<p class=3DMsoNormal style=3D'margin-top:0cm;margin-right:0cm;margin-bottom=
:0cm;
margin-left:148.85pt;margin-bottom:.0001pt'><span style=3D'font-family:"Cal=
ibri","sans-serif"'>Your
home has &lt;q_7&gt; alarm installed<o:p></o:p></span></p>

<p class=3DMsoNormal style=3D'margin-top:0cm;margin-right:0cm;margin-bottom=
:0cm;
margin-left:148.85pt;margin-bottom:.0001pt'><span style=3D'font-family:"Cal=
ibri","sans-serif"'>Your
home &lt;q_8&gt; connected to mains water supply<o:p></o:p></span></p>

一些例子: 例1: 要搜索的文字:

 <span
      style=3D'color:blue'><o:p></o:p></span></span></p>
      </td>
      <td width=3D103 valign=3Dtop style=3D'width:77.5pt;padding:0cm 5.4pt 0cm =
    0cm'>
      <p class=3DMsoNormal align=3Dright style=3D'margin-top:3.0pt;margin-right=
    :0cm;
      margin-bottom:0cm;margin-left:0cm;margin-bottom:.0001pt;text-align:right;
      tab-stops:155.95pt'><span style=3D'font-family:"Calibri","sans-serif"'>&lt;=
    <span
      class=3DSpellE>spec_contents_value</span>&gt;<span style=3D'color:blue'><=
    o:p></o:p></span></span></p>
      </td>
     </tr>
    </table>

    <p class=3DMsoNormal style=3D'margin-top:0cm;margin-right:0cm;margin-bottom=
    :0cm;
    margin-left:148.85pt;margin-bottom:.0001pt;text-indent:-148.85pt;tab-stops:
    148.85pt right 453.55pt'><span style=3D'font-family:"Calibri","sans-serif"'=
    ><o:p>&nbsp;</o:p></span></p>

    <p class=3DMsoNormal style=3D'margin-top:0cm;margin-right:0cm;margin-bottom=
    :0cm;
    margin-left:148.85pt;margin-bottom:.0001pt;text-indent:-148.85pt;tab-stops:
    148.85pt right 453.55pt'><span style=3D'font-family:"Calibri","sans-serif"'=
    >Unspecified
    Valuables<b style=3D'mso-bidi-font-weight:normal'><span style=3D'mso-tab-co=
    unt:
    1'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </=
    span></b>&lt;valuables&gt;<o:p></o:p></span></p>

    <p class=3DMsoNormal style=3D'margin-top:0cm;margin-right:0cm;margin-bottom=
    :0cm;
    margin-left:148.85pt;margin-bottom:.0001pt;text-indent:-148.85pt;tab-stops:
    148.85pt right 453.55pt'><span style=3D'font-family:"Calibri","sans-serif"'=
    >Specified
    Valuables<b style=3D'mso-bidi-font-weight:normal'><span style=3D'mso-tab-co=
    unt:
    1'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
    sp;&nbsp;&nbsp;&nbsp;&nbsp; </span></b>&lt;<spanclass=3DSpellE>spec_valuables_ni</span>&gt;=
    <o:p></o:p></span></p>

我希望我的Regex.Match模式能够搜索:

&lt;=
<span
  class=3DSpellE>spec_contents_value</span>&gt;

或者任何&lt; ...&gt;模式跨越多条线。但不是那些出现在同一条线上的人。

2 个答案:

答案 0 :(得分:1)

使用DOTALL修饰符使点匹配偶数换行符(\n\r)。

(?s)&lt;(?:(?!&[gl]t;).)*?\n(?:(?!&[gl]t;).)*?&gt;

DEMO

答案 1 :(得分:1)

正则表达式怎么样

 &lt;[^&]*&gt;

例如http://regex101.com/r/iV9lS4/3

  • &lt;匹配&lt;

  • [^&]*匹配&以外的任何内容,包括换行符

  • &gt;匹配&gt;

您还可以通过提供DOTALL .运算符来使用(?s)来匹配任何内容。

输入

&lt;=
<span
  class=3DSpellE>spec_contents_value</span>&gt;

匹配为http://regex101.com/r/iV9lS4/4