awk基于重复字段保留相邻行

时间:2017-01-30 04:21:38

标签: awk duplicates

我正在处理tshark字段的一些输出。已经发生了一些处理,现在存在相邻的行,其中最后一个字段是重复的。这些重复项缺少具有匹配序列号的行。任务是仅保留序列列匹配的相邻行对。最后一个字段的值为0和130,每对行以130开头。序列号为0-15。数据流包含许多行。 字段是:

date       time                 src-int dst-int seq     function
24/01/2017 16:57:27.307400000   10      1000    11      130
24/01/2017 16:57:27.418675000   1000    10      11      0
24/01/2017 16:58:53.603604000   1000    10      12      0
24/01/2017 16:58:54.121603000   10      1000    13      130
24/01/2017 16:58:54.677752000   10      1000    14      130
24/01/2017 16:58:54.681079000   1000    10      14      0
24/01/2017 17:09:12.974979000   10      1000    1       130
24/01/2017 17:09:12.981149000   1000    10      1       0
24/01/2017 17:09:13.477211000   1000    10      2       0
24/01/2017 17:09:14.026279000   1000    10      3       0

所需的输出是使成对的行保持功能顺序130然后是0并匹配序列号:

24/01/2017 16:57:27.307400000   10      1000    11      130
24/01/2017 16:57:27.418675000   1000    10      11      0
24/01/2017 16:58:54.677752000   10      1000    14      130
24/01/2017 16:58:54.681079000   1000    10      14      0
24/01/2017 17:09:12.974979000   10      1000    1       130
24/01/2017 17:09:12.981149000   1000    10      1       0

我有一个半成品的解决方案。它匹配\t130$并获取下一行,如果序列匹配则打印。它返回良好的数据,但它不处理130的重复值。在示例数据中,它省略了序列14.相邻的重复行的数量是任意的,因此嵌套另一个测试似乎很愚蠢。

awk "/\t130$/ {seq=$5; prev=$0; getline;} $5==seq {print prev; print;}"

如何最好地处理开始条件下的所有重复项?

BTW,在Windows 7中使用GNU awk。 FWIW这两行最终将使用print prev,$0连接,为清楚起见未显示。

2 个答案:

答案 0 :(得分:1)

awk 'NR==1 { print; next } 
    $6 == 0 && $5 == seq && c == 0 { print row; print; c++ }
    $6 == 130 { seq=$5; row=$0; c=0 }
' file
date       time                 src-int dst-int seq     function
24/01/2017 16:57:27.307400000   10      1000    11      130
24/01/2017 16:57:27.418675000   1000    10      11      0
24/01/2017 16:58:54.677752000   10      1000    14      130
24/01/2017 16:58:54.681079000   1000    10      14      0
24/01/2017 17:09:12.974979000   10      1000    1       130
24/01/2017 17:09:12.981149000   1000    10      1       0

答案 1 :(得分:0)

$ awk 'p==$5 && q==130 && $6==0 {print b $0} {p=$5; q=$6; b=$0 "\n"}' file
24/01/2017 16:57:27.307400000   10      1000    11      130
24/01/2017 16:57:27.418675000   1000    10      11      0
24/01/2017 16:58:54.677752000   10      1000    14      130
24/01/2017 16:58:54.681079000   1000    10      14      0
24/01/2017 17:09:12.974979000   10      1000    1       130
24/01/2017 17:09:12.981149000   1000    10      1       0
  • p上一个$5,应与当前$5相同
  • q是之前的$6,应为130
  • b缓冲了之前的$0,并附加了\n用于非常print