Question

我想删除两个字符串之间的所有文本，除了以某些字符串开头的行。使用以下示例，我想删除不以BEGIN或END开头的行BREAK1和BREAK2之间的文本：< / p>

keep keep keep
BEGIN
remove remove remove
remove remove remove
BREAK1 keep keep keep
remove remove remove
BREAK2 keep keep keep
remove remove remove
END
keep keep keep

有人知道如何用正则表达式做到这一点吗？

Answer 1

perl -ne 'print if !(/^BEGIN/ .. /^END/) or /^BREAK/' file

输出

keep keep keep
BREAK1 keep keep keep
BREAK2 keep keep keep
keep keep keep

标量上下文中的

..为perl flip-flop opeartor，/^BEGIN/ .. /^END/将评估true和BEGIN之间的所有行的END。

Answer 2

好吧，您可以将其读取或拆分为@lines，然后遍历每一行，跟踪您的状态（BEGIN..END块的内部或外部）。如果在外面，请保持并传递线路。如果在里面，如果$line =~ m/^BREAK\d+\s*(.*)$/返回FALSE则丢弃，否则$ 1包含保留文本。我会把它作为练习留给学生，以确定你是否处于BEGIN区块。

Answer 3

您可以使用此模式：

s/(?:^BEGIN\R|\G(?<!\A)(?:(?:BREAK1|BREAK2).*\R|END(?=\R|$)))\K|\G(?<!\A).*\R//gm

我们的想法是先匹配所有必须保留的内容，然后将匹配结果重置为\K。 \G锚用于确保匹配的不同部分是连续的。但是，当前模式不检查标记“END”的存在。如果它不存在，则替换继续到字符串的结尾（与html标记相同的行为）。要避免此行为，您可以在最后添加前瞻：(?=(?s).*?\REND(?:\R|$))

模式细节：

(?:                       # non capturing group for all that must be preserved
    ^BEGIN\R              # the word "BEGIN" at the start of a line, followed
                          # by a newline
  |                       # OR
    \G                    # contiguous to a precedent match or at the start of
                          # the string
    (?<!\A)               # lookbehind: not preceded by the start of the string
    (?:                   # non capturing group: all that must be contiguous
        (?:BREAK1|BREAK2) # one of this two words
        .*\R              # all until the newline (included)
      |                   # OR
        END               # 
        (?=\R|$)          # lookahead to check if END is followed by a newline
                          # or the end of the string. Since it is a zero-width 
                          # assertion and doesn't match anything, it is used to
                          # contiguous matches.
    )                     # close the 2nd non capturing group
)                         # close the 1st non capturing group
\K                        # reset the 1st non capturing group from match result
|                         # OR
\G(?<!\A).*\R             # all that is contiguous to a precedent match until
                          # the newline (included)

Answer 4

好的，这是一个perl问题，但我无法抗拒发布sed(1)解决方案：

sed '/^BEGIN/,/^END/ { /^BREAK[12]/!d }'

Answer 5

在Linux机器上，您可以运行egrep命令

egrep -v ^BREAK

删除分隔符中的行，但匹配正则表达式的行除外

5 个答案: