C#正则表达式匹配并包裹多行

时间:2017-03-29 07:23:47

标签: c# regex

我需要制作这样的文字,例如

Founded in 2008, Stack Overflow sees 40 million visitors each month

|| <b>ID</b> || <b>Column1</b> || <b>Column2</b> ||
| | | |

Stack Overflow Documentation, the largest content expansion since Q&A, launches in July

|| <b>Name</b> || <u>Surname</u> || <u>DoB</u> ||
| | | |

The Developer Story launches in October, giving developers a better way to present their skills

看起来像那样

    Founded in 2008, Stack Overflow sees 40 million visitors each month

<span>|| <b>ID</b> || <b>Column1</b> || <b>Column2</b> ||
| | | |</span>

Stack Overflow Documentation, the largest content expansion since Q&A, launches in July

<span>|| <b>Name</b> || <u>Surname</u> || <u>DoB</u> ||
| | | |
| | | |
</span>

The Developer Story launches in October, giving developers a better way to present their skills

如果我尝试这样的正则表达式

(
 (
   (^|\r\n|)+(\|{1,2})
  )
  (
    [\s\S]*
  )
  (\|{1,2}
   ($|\r\n|)+
  )
)

但它不是我需要的,它选择了错误的区域,你可以在这里看到https://regex101.com/r/0h7gVV/2

其他尝试看起来像那样

((^|\r\n{2,}|)+(\|{1,2}))(.*)(\|{1,2}(\r\n{2,}|$|)+)

但最终选择了每一行,您可以在此处查看示例https://regex101.com/r/qpwdwj/2

我应该如何更改正则表达式以使其正常工作?

UPD

WiktorStribiżew(感谢他)在评论中告诉我尝试他的例子,它在上面的例子中表现良好,但不适用于所有可能的情况(例如https://regex101.com/r/PvPsxF/3}

所谓的表看起来像那样

|| A | B |
|| c | d |

或那

| a | b | c |
| d | e | f |

UPD2

那是一个https://regex101.com/r/PvPsxF/7,但它有空的空间

UPD3

这个是关闭的(https://regex101.com/r/PvPsxF/8),但对于此测试文本

Stack Overflow Documentation, the largest content expansion since Q&A, launches in July

|| <b>Name</b> || <u>Surname</u> || <u>DoB</u> ||
| | | |


||


| a | b | c | u |

The Developer Story launches in October, giving developers a better way to present their skills

| a | b | c |
| d | e | f |

就像那样

Stack Overflow Documentation, the largest content expansion since Q&A, launches in July
<span>
|| <b>Name</b> || <u>Surname</u> || <u>DoB</u> ||
| | | |

<!-- not suppose to be wraped up -->
</span><span>||


| a | b | c | u |

</span>The Developer Story launches in October, giving developers a better way to present their skills
<span>
| a | b | c |
| d | e | f |</span>

当我不想在行内包裹单个||外观时(在这种情况下假设被忽略)

更多示例的屏幕截图 enter image description here

P.S。

这就是说,下面的标记

|| <b>ID</b> || <b>Column1</b> || <b>Column2</b> ||
| | | |

将解析为html看起来像表格,其中|| Cell ||代表标题,| cell |代表常规单元格

所以,解析之后会看起来像

<table>
  <tr>
    <th>ID</th>
    <th>Column1</th>
    <th>Column2</th>
  </tr>
  <tr>
    <td>&nbsp;</td>
    <td>&nbsp;</td>
    <td>&nbsp;</td>
  </tr>
</table>

1 个答案:

答案 0 :(得分:1)

正则表达式是

(\|\|?([^|\n\r]+\|\|?)+($|[\r\n]+))+

匹配组为$0demo)。

它的工作原理如下:

(
  \|\|?         #the line starts with one or two pipes
  (
    [^|\n\r]+   #followed by at least one non-pipe characther
    \|\|?       #and the cell endt with one or two pipes
  )+            #at least one cell, otherwise even the line "||" would be matched
  (
    $           #the text ends (you are NOT in multiline mode) 
  |
    [\r\n]+     #or [\r\n] characters are matched (at least one, otherwise would match even "||A|B"), in order to match also the possible following line
  )
)+              #at least one line

如果你不想在“表格”之后匹配空格/新行,只需使用一个更难的正则表达式(demo):

\|\|?([^|\n\r]+\|\|?)+$([\r\n]+\|\|?([^|\n\r]+\|\|?)+$)*

在最后一个正则表达式中,请记住使用m标志。