我想删除两个字符串之间的所有文本,除了以某些字符串开头的行。使用以下示例,我想删除不以BEGIN
或END
开头的行BREAK1
和BREAK2
之间的文本:< / p>
keep keep keep
BEGIN
remove remove remove
remove remove remove
BREAK1 keep keep keep
remove remove remove
BREAK2 keep keep keep
remove remove remove
END
keep keep keep
有人知道如何用正则表达式做到这一点吗?
答案 0 :(得分:8)
perl -ne 'print if !(/^BEGIN/ .. /^END/) or /^BREAK/' file
输出
keep keep keep
BREAK1 keep keep keep
BREAK2 keep keep keep
keep keep keep
标量上下文中的 ..
为perl flip-flop opeartor,/^BEGIN/ .. /^END/
将评估true
和BEGIN
之间的所有行的END
。
答案 1 :(得分:1)
好吧,您可以将其读取或拆分为@lines,然后遍历每一行,跟踪您的状态(BEGIN..END块的内部或外部)。如果在外面,请保持并传递线路。如果在里面,如果$line =~ m/^BREAK\d+\s*(.*)$/
返回FALSE则丢弃,否则$ 1包含保留文本。我会把它作为练习留给学生,以确定你是否处于BEGIN区块。
答案 2 :(得分:1)
您可以使用此模式:
s/(?:^BEGIN\R|\G(?<!\A)(?:(?:BREAK1|BREAK2).*\R|END(?=\R|$)))\K|\G(?<!\A).*\R//gm
我们的想法是先匹配所有必须保留的内容,然后将匹配结果重置为\K
。 \G
锚用于确保匹配的不同部分是连续的。但是,当前模式不检查标记“END”的存在。如果它不存在,则替换继续到字符串的结尾(与html标记相同的行为)。要避免此行为,您可以在最后添加前瞻:(?=(?s).*?\REND(?:\R|$))
模式细节:
(?: # non capturing group for all that must be preserved
^BEGIN\R # the word "BEGIN" at the start of a line, followed
# by a newline
| # OR
\G # contiguous to a precedent match or at the start of
# the string
(?<!\A) # lookbehind: not preceded by the start of the string
(?: # non capturing group: all that must be contiguous
(?:BREAK1|BREAK2) # one of this two words
.*\R # all until the newline (included)
| # OR
END #
(?=\R|$) # lookahead to check if END is followed by a newline
# or the end of the string. Since it is a zero-width
# assertion and doesn't match anything, it is used to
# contiguous matches.
) # close the 2nd non capturing group
) # close the 1st non capturing group
\K # reset the 1st non capturing group from match result
| # OR
\G(?<!\A).*\R # all that is contiguous to a precedent match until
# the newline (included)
答案 3 :(得分:0)
好的,这是一个perl问题,但我无法抗拒发布sed(1)
解决方案:
sed '/^BEGIN/,/^END/ { /^BREAK[12]/!d }'
答案 4 :(得分:-1)
在Linux机器上,您可以运行egrep命令
egrep -v ^BREAK