匹配正则表达式中的文本部分

时间:2018-02-05 07:45:32

标签: regex regex-lookarounds backreference

所以我有一个标题,我可以开始匹配文本,然后至于该部分的结尾,我使用标题的反向引用,以确定一个部分的结束:

示例数据:

Section 1
sub-header here:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam sed interdum erat. Donec sed felis sit amet sem mattis aliquet non in turpis. 

sub-section with one newline above
option A
option B


sub-section 2 with two newline above
setting1: value of setting1
setting2: value of setting2


Section 2
sub-header here:
Nulla maximus mollis urna, in lobortis est auctor a. Ut erat enim, volutpat id tortor eget, elementum fermentum nisi.

sub-section with one newline above
option A
option B


sub-section 2 with two newline above
setting1: value of setting1
setting2: value of setting2


Section 3
sub-header here:
Sed suscipit eleifend arcu fringilla pulvinar. Maecenas ullamcorper efficitur fringilla.

sub-section with one newline above
option A
option B


sub-section 2 with two newline above
setting1: value of setting1
setting2: value of setting2

Demo

我的正则表达式如下:

(?:^|\n)((Section\s*)(\d+))$([\s\S]*?)(?=\2)

这匹配前两个部分,但不匹配最后一部分。

1 个答案:

答案 0 :(得分:1)

试试这个正则表达式:

(Section\s*\d+)([\s\S]*?)(?=\s*Section\s*\d+|$)

Click for Demo

<强>解释

  • (Section\s*\d+) - 匹配文本Section,后跟0 +空格,后跟1位以上的数字,并在第1组中捕获整个内容
  • ([\s\S]*?) - 匹配任何字符的0次出现并在第2组中捕获它
  • (?=\s*Section\s*\d+|$) - 积极向前看,以确保上面匹配的内容必须跟在字符串的结尾或0 +空格后跟Section后跟0 +空格后跟1+位数