AWK-根据时间删除所有但最后一次出现的日志文件行

时间:2018-08-01 21:59:58

标签: perl awk sed

我有一个文件,可从DHCP服务器收集选项82数据。这些文件包含在所有方面都相似的行,除了时间戳和它们来自的服务器。除了基于时间的相似行的最后一次出现之外,我需要删除所有“相关”行。

我的原始文件如下:

 Aug  1 16:23:05 serverA dhcpd: Service A OPTION-82 | IP =192.168.1.100 | MAC=70:73:cb:b3:3c:58 | CIRCUIT-ID=0a:00:3e:bb:7d:fe | REMOTE-ID=0a:00:3e:bb:73:4a
 Aug  1 16:24:55 serverB dhcpd: Service B OPTION-82 | IP =192.168.1.100 | MAC=38:71:de:4b:f2:46 | CIRCUIT-ID=0a:00:3e:bb:7d:fe | REMOTE-ID=0a:00:3e:bb:73:4a
 Jul 27 16:37:46 serverA dhcpd: Service A OPTION-82 | IP =192.168.2.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
 Jul 31 13:20:11 serverB dhcpd: Service B OPTION-82 | IP =192.168.2.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f 
 Jul 27 16:37:46 serverB dhcpd: Service A OPTION-82 | IP =192.168.3.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
 Jul 31 13:20:11 serverA dhcpd: Service A OPTION-82 | IP =192.168.3.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f

文本处理后,我需要实现以下目标:

  Aug  1 16:24:55 serverB dhcpd: Service B OPTION-82 | IP =192.168.1.100 | MAC=38:71:de:4b:f2:46 | CIRCUIT-ID=0a:00:3e:bb:7d:fe | REMOTE-ID=0a:00:3e:bb:73:4a
  Jul 31 13:20:11 serverB dhcpd: Service B OPTION-82 | IP =192.168.2.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
  Jul 31 13:20:11 serverA dhcpd: Service A OPTION-82 | IP =192.168.3.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f

到目前为止,我已经尝试了一些方法,但是这些方法似乎删除了某些行的所有实例,并且完成的文件丢失了我们需要的数据。

 /bin/awk '!_[$9]++' rawfile
 /bin/awk 'NR == FNR {if (z[$9]) y[z[$9]]; z[$9] = FNR; next} !(FNR in y)' rawfile rawfile
 tac rawfile | awk '!seen[$9]++' | tac > finished_file

我绝不是awk的专家。我已经通过谷歌搜索找到并尝试了这些,因此,我将获得的任何帮助将不胜感激。而且,我愿意接受其他文本处理工具,而不仅仅是awk。

1 个答案:

答案 0 :(得分:3)

根据评论中的讨论,输入文件实际上是按时间戳以升序排列的,并且您要在IP上进行匹配。

$ cat input.txt 
 Aug  1 16:23:05 serverA dhcpd: Service A OPTION-82 | IP =192.168.1.100 | MAC=70:73:cb:b3:3c:58 | CIRCUIT-ID=0a:00:3e:bb:7d:fe | REMOTE-ID=0a:00:3e:bb:73:4a
 Aug  1 16:24:55 serverB dhcpd: Service B OPTION-82 | IP =192.168.1.100 | MAC=38:71:de:4b:f2:46 | CIRCUIT-ID=0a:00:3e:bb:7d:fe | REMOTE-ID=0a:00:3e:bb:73:4a
 Jul 27 16:37:46 serverA dhcpd: Service A OPTION-82 | IP =192.168.2.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
 Jul 27 16:37:46 serverB dhcpd: Service A OPTION-82 | IP =192.168.3.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
 Jul 31 13:20:11 serverB dhcpd: Service B OPTION-82 | IP =192.168.2.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f 
 Jul 31 13:20:11 serverA dhcpd: Service A OPTION-82 | IP =192.168.3.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f
$ perl -ne '/\bIP\s*=\s*([\d.]+)\b/||next;$x{$1}=$_}{print $x{$_} for sort keys %x' input.txt 
 Aug  1 16:24:55 serverB dhcpd: Service B OPTION-82 | IP =192.168.1.100 | MAC=38:71:de:4b:f2:46 | CIRCUIT-ID=0a:00:3e:bb:7d:fe | REMOTE-ID=0a:00:3e:bb:73:4a
 Jul 31 13:20:11 serverB dhcpd: Service B OPTION-82 | IP =192.168.2.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f 
 Jul 31 13:20:11 serverA dhcpd: Service A OPTION-82 | IP =192.168.3.100 | MAC=3c:90:66:64:c7:20 | CIRCUIT-ID=0a:00:3e:bb:a2:37 | REMOTE-ID=0a:00:3e:bb:c1:3f

注意:sort keys %x并不完美,因为它将按字母顺序对行进行排序。如果需要与原始文件中相同的顺序,请指定,并按照我在注释中所说的,显示更具代表性的输入(和输出)数据样本。另请参见Minimal, Complete, and Verifiable Example