如何确定最近出现的天气文件中的字符串是使用perl的新(唯一)?

时间:2015-04-02 06:57:13

标签: string perl unix netflow

假设我有这种带有互联网流量信息的文件(文件包含无限量的字符串):

startTime                     sourceIP    destinationIP 
2015-03-31 08:47:27.671      10.0.26.48     10.0.26.255 
2015-03-31 08:47:28.108     10.50.26.180     10.90.26.255 
2015-03-31 08:47:35.015      10.0.26.74 255.255.255.255 
                         ...
2015-03-31 16:18:25.365      196.0.26.13     224.0.0.252 
2015-03-31 16:18:32.718      10.46.26.13     224.0.0.252 
2015-03-31 16:18:46.941      188.0.26.98     177.0.26.255 
2015-03-31 16:18:58.336      10.0.26.57     10.0.26.255
2015-03-31 15:53:37.451      50.0.26.13     224.0.0.252 
2015-03-31 15:53:55.086      10.0.26.13     40.30.0.252 
2015-03-31 15:53:55.097      128.0.26.13     224.0.0.252
                         ...
2015-04-01 22:38:43.500   192.168.0.109   78.57.218.154 
2015-04-01 22:38:43.500  213.159.38.184   192.168.0.109 
2015-04-01 22:38:46.359   178.250.32.43   192.168.0.109
2015-04-01 22:38:53.269  213.159.38.184   192.168.0.109 
2015-04-01 22:38:53.269   192.168.0.109  213.159.38.184 
2015-04-01 22:39:14.995    54.83.28.184   192.168.0.109

我想要做的是确定新出现的天气IP地址没有在上面的任何地方列出,所以我可以将它们标记为新的并将它们保存在其他地方。即使它们在最近几天出现,我也认为它们是新的。

perl最好的编程解决方案是什么?

1 个答案:

答案 0 :(得分:0)

哈希通常用于此类任务。我们已经确定了将IP视为 new 的时间。

use strict;
use warnings;

sub parse_time {
    local $_ = shift if @_;
    split /[-\s:]+/;
}

sub cmp_array {
    my $ref = shift;
    for my $i ( 0 .. $#$ref ) {
        my $cmp = $ref->[$i] <=> $_[$i];
        return $cmp if $cmp;
    }
    return ();
}

die "Not enough parameners" unless @ARGV;

my $since = [ parse_time(shift) ];
my %seen;
while (<>) {
    my ( $date, $time, @ips ) = split;
    next unless @ips > 1;  # expect at least two IP, otherwise malformed data;
    if ( cmp_array( $since, parse_time("$date $time") ) < 0 .. 0 ) {
        exists $seen{$_} or print "$date $time $_\n" for @ips;
    }
    @seen{@ips} = ();
}

示例输出

$ perl code.pl '2015-04-01 22:38:43' file.txt
2015-04-01 22:38:43.500 192.168.0.109
2015-04-01 22:38:43.500 78.57.218.154
2015-04-01 22:38:43.500 213.159.38.184
2015-04-01 22:38:46.359 178.250.32.43
2015-04-01 22:39:14.995 54.83.28.184

如果您想知道某些IP是否在过去两天内首次出现,您可以使用

perl code.pl "$(date --date='2 days ago' '+%Y-%m-%d %H:%M:%S')" file.txt | grep 192.168.0.109

例如

$ perl code.pl "$(date --date='9 days ago' '+%Y-%m-%d %H:%M:%S')" file.txt | grep -q 192.168.0.109 && echo NEWBIE || echo OLD DOG
NEWBIE

但是,在这种情况下,您根本不必使用Perl

( cat file.txt; echo $(date --date='2 days ago' '+%Y-%m-%d %H:%M:%S') MY_SUPER_DELIMITER ) |
sed 's/\s\+/\t/g' | sort | cut -f 3,4 | sed '/^MY_SUPER_DELIMITER$/,$d' |
grep -q 192.168.0.109 && echo OLD DOG || echo NEWBIE