Question

我有以下文字。

^0001   HeadOne


@@
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been theindustry's standard dummy text ever since the 1500s, when an unknown printer took a galley of typeand scrambled it to make a type specimen book.

^0002   HeadTwo


@@
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been theindustry's standard dummy text ever since the 1500s, when an unknown printer took a galley of typeand scrambled it to make a type specimen book.

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been theindustry's standard dummy text ever since the 1500s, when an unknown printer took a galley of typeand scrambled it to make a type specimen book.


^004    HeadFour


@@
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.

^0004   HeadFour


@@
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been theindustry's standard dummy text ever since the 1500s, when an unknown printer took a galley of typeand scrambled it to make a type specimen book.

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been theindustry's standard dummy text ever since the 1500s, when an unknown printer took a galley of typeand scrambled it to make a type specimen book.

以下是我用于查找的正则表达式。

@@([\n\r\s]*)(.*)([\n\r\s]+)\^

但这仅仅是^0001和^0003，因为它们只有一个段落，但在我的文本中有多段内容。

我使用VS代码，有人可以告诉我如何在VS代码或NPP中使用REGEX捕获这样的多参数字符串。

由于

Answer 1

关于VSCode正则表达式的一个奇怪的事情是\s与所有换行符不匹配。需要使用[\s\r]来匹配所有这些内容。

记住这一点，您希望匹配以@@开头的所有子字符串，然后在字符串的行或字符串结尾处延伸到^。

我建议：

@@.*(?:[\n\r]+(?!\s*\^).*)*

请参阅regex demo

注意：要仅在行的开头匹配@@，请在模式的开头^添加^@@.*(?:[\s\r]+(?!\s*\^).*)*。

注意2：从VSCode 1.29开始，您需要enable search.usePCRE2 option启用正则表达式模式中的前瞻。

<强>详情

^ - 开始行
@@ - 文字@@
.* - 该行的其余部分（除了换行符之外的0 +字符）
(?:[\n\r]?(?!\s*\^).*)* - 连续发生0次或以上：
- [\n\r]+(?!\s*\^) - 一个或多个换行符后面没有0 +空格，然后是^ char
- .* - 其余部分

在Notepad ++ 中，使用^@@.*(?:\R(?!\h*\^).*)*其中\R与换行符匹配，\h*匹配0个或更多水平空格（如果^则移除始终是分界线上的第一个字符。）

Answer 2

我将输入数据插入到/ tmp / test中，并使用perl语法

进行以下操作

grep -Pzo "@@(?:\s*\n)+((?:.*\s*\n)+)(?:\^.*)*\n+" /tmp/test

这应该是将不是以^开头的paragraphe放入$ 1。您可能需要将\ r \ n添加回此处以使其完全匹配

匹配多行使用正则表达式

2 个答案: