从平面文件中过滤(awk,设置等)记录

时间:2016-01-05 17:18:45

标签: awk sed full-text-search record flat-file

我希望过滤掉文件中的特定记录。我认为最简单的方法是使用以下方法awk(或sed等):

for i in aaa bbb ccc; do awk '$i,/Record Closing String/' filename.txt >> output_file.txt; done

这将是一个类似......的文件。

 aaa this is just scrap text

 There is more scrap text.
 And even more scrap text with the identifier again:  aaa.

 And even more scrap text.
 Record Closing String

 xaa this is just scrap text

 There is different scrap text.
 And even more scrap text with the identifier again:  xaa.

 And even more scrap text.
 Record Closing String

 bbb this is just scrap text

 There is more slightly different scrap text.
 And even more different scrap text with the identifier again:  bbb.

 And even more scrap text.
 Record Closing String

 ddd this is just scrap text

 There is different scrap text.
 And even more different scrap text with the identifier again:  ddd.

 And even more scrap text.
 Record Closing String

 eee this is just scrap text

 There is different scrap text.
 And even more different scrap text with the identifier again:  eee.

 And even more scrap text.
 Record Closing String

 ccc this is just scrap text

 There is different scrap text.
 And even more different scrap text with the identifier again:  ccc.

 And even more scrap text.
 Record Closing String

但是,我的结果集比我的原始文件大(它似乎包含原始文件的最少部分很多次,很多次)...是否有命令我可以运行以获取我的记录的一个副本要匹配下一个Record Closing String的字符串的第一个实例?我基本上想从第一个结果匹配到下一个记录结束字符串(见下文)......

 aaa this is just scrap text

 There is more scrap text.
 And even more scrap text with the identifier again:  aaa.

 And even more scrap text.
 Record Closing String

 bbb this is just scrap text

 There is more slightly different scrap text.
 And even more different scrap text with the identifier again:  bbb.

 And even more scrap text.
 Record Closing String

 ccc this is just scrap text

 There is different scrap text.
 And even more different scrap text with the identifier again:  ccc.

 And even more scrap text.
 Record Closing String

1 个答案:

答案 0 :(得分:0)

如果你可以使用gawk(GNU awk),那么你可以使用regexp记录分隔符,这变得非常简单:

gawk -v RS='Record Closing String' '/aaa|bbb|ccc/' filename

请注意,这将在整个记录中的任何位置找到标识符,这可能是您可能需要的,也可能不是。如果需要,您可以添加更具体的正则表达式模式(取决于实际数据的样子)。