Question

我有一个巨大的文本文件，在不同的时间有几次相同的迭代，基本结构为：

Header (5 lines)
Data (thousands of lines)
Header (5 lines)
Data (thousands of lines)
Header (5 lines)
Data (thousands of lines)

这重复并持续一段时间。

我想通过删除其他每一组Header + Data来剔除这个文件。我以为我会使用sed，但我无法弄清楚如何。

每个“循环”以相同的行开始可能会有所帮助（出于本示例的目的，假设它显示为Program X output）并且该精确行仅在每个“循环”开始时出现一次”

由于

Answer 1

听起来你只需要：

awk '/Program X output/ && c++{exit} 1' file

e.g。

$ seq 50 | awk '/2/ && c++{exit} 1'
1
2
3
4
5
6
7
8
9
10
11

如果您不是全部，那么请编辑您的问题以澄清您的要求，并向我们展示简明，可测试的样本输入和预期输出。

Answer 2

跟踪您查看关键字的频率，并仅在此计数为奇数时打印：

awk '/Program X output/ {n++} n%2 == 1' <<END
Program X output
a
b
c
Program X output
d
e
Program X output
f
g
h
i
j
Program X output
m
n
o
END

Program X output
a
b
c
Program X output
f
g
h
i
j

Answer 3

这可能适合你（GNU sed）：

sed -r '/Program X output/{x;s/^/x/;x};G;/\n(x{2})*$/!P;d' file

遇到标题行时，将1添加到保留空间（HS）中的计数器。如果计数器是所需数量的倍数，则将HS附加到每一行，并仅在模式空间（PS）中打印第一行。