使用awk修剪2个模式之外的文本文件的部分

时间:2015-02-20 11:10:26

标签: linux bash awk

我想要一个优雅的awk解决方案来编辑​​文件中的行。到目前为止,我只能使用2个sed命令和1个awk命令完成任务。

每个文件由不确定长度的标题组成,后跟我想要捕获的数据,然后是始终以相同字符串(WATER)开头的页脚。数据由几个3行组块组成,我希望将它们连接成单行,每个3行块以相同的字符串(GROUPS)开头。

每当我发现GROUPS连接下一行直到下一次出现GROUPS并重复,直到找到WATER,删除WATER行,并删除所有后续行到文件末尾。

输入:

header stuff
more header stuff
even more header stuff
GROUPS data data data data
mo data mo data mo data
even more even more
GROUPS data data data data
mo data mo data mo data
even more even more
GROUPS data data data data
mo data mo data mo data
even more even more
.......
last line of data
WATER footer stuff footer stuff
footer stuff
more footer stuff
even more footer stuff

输出:

GROUPS data data data data mo data mo data mo data even more even more
GROUPS data data data data mo data mo data mo data even more even more
GROUPS data data data data mo data mo data mo data even more even more
........
GROUPS data data data data mo data mo data even more last line of data

非常感谢任何帮助!

编辑:

这是我的(可能是片状的)解决方案!

1:修剪标题

sed -n '/"GROUPS"/,$p' originalfile > outputfile1

2:修剪页脚

sed '/"WATER"/,$d' outputfile1 > outputfile2

3:连接线

awk 'NF&&$1=RS$1' RS="GROUPS" outputfile2 > finaloutputfile

2 个答案:

答案 0 :(得分:2)

这是gnu awk(由于记录分隔符中的多个字符而引起的gnu)

awk -v RS="GROUPS|WATER" -F"\n" 'p=="WATER"{exit} {$1=p $1}NR>1; {p=RT}' file
GROUPS data data data data mo data mo data mo data even more even more
GROUPS data data data data mo data mo data mo data even more even more
GROUPS data data data data mo data mo data mo data even more even more ....... last line of data

通过将RS设置为GROUPSWATER并重新创建行$1=p $1,它会将所有内容排成一行。
如果行以WATER开头,则退出。这样就不再从WATER打印下来了 p设置为上一个RT(使用的分隔符)

答案 1 :(得分:1)

让我们用艰苦的方式:

awk '/^GROUPS/ {if (string) print string; f=1; string=$0; next}
     /^WATER/ {print string; f=0}
     f {string=string" "$0}' file

这开始"录制"找到string时变量GROUPS中的行,并在找到WATER时停止这样做。查看GROUPS时,还会打印存储的字符串(如果存在)并清除它以进行下一次迭代。

测试

$ awk '/^GROUPS/ {if (string) print string; f=1; string=$0; next} /^WATER/ {print string; f=0} f {string=string=stri $0}' a
GROUPS data data data data mo data mo data mo data even more even more
GROUPS data data data data mo data mo data mo data even more even more
GROUPS data data data data mo data mo data mo data even more even more ....... last line of data