我正在处理tshark字段的一些输出。已经发生了一些处理,现在存在相邻的行,其中最后一个字段是重复的。这些重复项缺少具有匹配序列号的行。任务是仅保留序列列匹配的相邻行对。最后一个字段的值为0和130,每对行以130开头。序列号为0-15。数据流包含许多行。 字段是:
date time src-int dst-int seq function
24/01/2017 16:57:27.307400000 10 1000 11 130
24/01/2017 16:57:27.418675000 1000 10 11 0
24/01/2017 16:58:53.603604000 1000 10 12 0
24/01/2017 16:58:54.121603000 10 1000 13 130
24/01/2017 16:58:54.677752000 10 1000 14 130
24/01/2017 16:58:54.681079000 1000 10 14 0
24/01/2017 17:09:12.974979000 10 1000 1 130
24/01/2017 17:09:12.981149000 1000 10 1 0
24/01/2017 17:09:13.477211000 1000 10 2 0
24/01/2017 17:09:14.026279000 1000 10 3 0
所需的输出是使成对的行保持功能顺序130然后是0并匹配序列号:
24/01/2017 16:57:27.307400000 10 1000 11 130
24/01/2017 16:57:27.418675000 1000 10 11 0
24/01/2017 16:58:54.677752000 10 1000 14 130
24/01/2017 16:58:54.681079000 1000 10 14 0
24/01/2017 17:09:12.974979000 10 1000 1 130
24/01/2017 17:09:12.981149000 1000 10 1 0
我有一个半成品的解决方案。它匹配\t130$
并获取下一行,如果序列匹配则打印。它返回良好的数据,但它不处理130的重复值。在示例数据中,它省略了序列14.相邻的重复行的数量是任意的,因此嵌套另一个测试似乎很愚蠢。
awk "/\t130$/ {seq=$5; prev=$0; getline;} $5==seq {print prev; print;}"
如何最好地处理开始条件下的所有重复项?
BTW,在Windows 7中使用GNU awk。
FWIW这两行最终将使用print prev,$0
连接,为清楚起见未显示。
答案 0 :(得分:1)
awk 'NR==1 { print; next }
$6 == 0 && $5 == seq && c == 0 { print row; print; c++ }
$6 == 130 { seq=$5; row=$0; c=0 }
' file
date time src-int dst-int seq function
24/01/2017 16:57:27.307400000 10 1000 11 130
24/01/2017 16:57:27.418675000 1000 10 11 0
24/01/2017 16:58:54.677752000 10 1000 14 130
24/01/2017 16:58:54.681079000 1000 10 14 0
24/01/2017 17:09:12.974979000 10 1000 1 130
24/01/2017 17:09:12.981149000 1000 10 1 0
答案 1 :(得分:0)
$ awk 'p==$5 && q==130 && $6==0 {print b $0} {p=$5; q=$6; b=$0 "\n"}' file
24/01/2017 16:57:27.307400000 10 1000 11 130
24/01/2017 16:57:27.418675000 1000 10 11 0
24/01/2017 16:58:54.677752000 10 1000 14 130
24/01/2017 16:58:54.681079000 1000 10 14 0
24/01/2017 17:09:12.974979000 10 1000 1 130
24/01/2017 17:09:12.981149000 1000 10 1 0
p
上一个$5
,应与当前$5
相同q
是之前的$6
,应为130 b
缓冲了之前的$0
,并附加了\n
用于非常print