Question

给定一个大的日志文件，grep一个文本块的最佳方法是什么？

text to be ignored
more text to be ignored
---                                 <---- start capture here
lots of 
text with separators like "---"
---
spanning 
multiple lines
---                                 <---- end capture here
text to be ignored
more text to be ignored

知道什么？

行中最大字符数（55但可能更少）
块中的行数
分隔符（可以重复）

什么正则表达式匹配此块？期望的输出：文本块列表。

请假设Linux命令行环境

Answer 1

几年前，我用这个将补丁分成了几个人：

sed -e '$ {x;q}' -e '/@@/ !{H;d}' -e '/@@/ x' # note - i know sed better now

将/@@/替换为/---/。

要在首先'---'之前删除所有内容，然后在'---'之后添加-e '1,/---/d'并删除整个-e '$ {x;q}'。

结果将是这样的：

sed -e '1,/---/d' -e '/---/ !{H;d}' -e x

刚测试过，它适用于给定的示例。

Answer 2

保持简单：

$ awk 'NR==FNR {if (/^---/) { if (!start) start=NR; end=NR } next} FNR>=start && FNR<=end' file file
---                                 <---- start capture here
lots of
text with separators like "---"
---
spanning
multiple lines
---                                 <---- end capture here

$ awk 'NR==FNR {if (/^---/) { if (!start) start=NR; end=NR } next} FNR>start && FNR<end' file file
lots of
text with separators like "---"
---
spanning
multiple lines

Answer 3

如果您有足够的内存，可以使用以下行。但请注意，它会将整个日志文件读入内存！

perl -0777 -lnE 'm{ ^--- .+ ^--- }xms and say $&' logfile

打出一块文本，正则表达式

3 个答案: