Question

我每周都要解析日常日志（在Red Hat系统上）并获取有关IP地址列的一些统计信息。每日日志包含

的数据

<device>,<ip>,<city>,<packets>

像这样：

样本数据 - 其中一个日志的前5行：

gw1,25.0.10.61,houston,50
gw1,25.0.20.61,dallas,30
gw1,25.0.30.60,ftworth,80
gw1,25.0.10.61,houston,40
gw1,25.0.10.62,houston,40

我想查看所有七个日志，并确定每个IP地址的数据包总数。

所需的输出是

<ip>,<packet_count>

按照所有七个日志的数据包计数排序，如下所示：

25.0.10.61,480
25.0.10.62,400
25.0.30.60,220

等

我不太确定散列是否是执行此操作的最佳方式，如果是，则如何处理它。

Answer 1

您可以为数据使用哈希值。

代码：

my $filename ="log.txt"; #provide your filename here
open FH, $filename or die "Error\n";
my %myhash;
while(<FH>)
{
    chomp($_);
    my @arr = split(/,/, $_);
    $myhash{$arr[1]} +=$arr[3];

}
#access the hash
foreach my $ip (keys %myhash)
{
     print "$ip\t$myhash{$ip}\n";

}

Answer 2

作为一个单行：

perl -F, -lane '
    $count{$F[1]} += $F[3]
  } {
    while (($ip,$n) = each %count) {print "$ip,$n"}
' file*.log | sort -t, -k2,2nr

排序可以在perl中完成，但它比这更长。

} {的技巧是将应该为每一行完成的代码与仅应在输入结束时运行的代码分开。单行转换为：

BEGIN { $/ = "\n"; $\ = "\n"; }
LINE: while (defined($_ = <ARGV>)) {
    chomp $_;
    our @F = split(/,/, $_, 0);
    $count{$F[1]} += $F[3];
}
{
    while (($ip, $c) = each %count) {
        print "$ip,$c";
    }
}

Answer 3

您没有说明如何获取日志文件的名称。此解决方案仅使用glob查找当前目录中以.log结尾的所有文件的名称。将此列表分配给@ARGV允许我们直接读取所有文件，而无需明确打开它们，就好像它们的名称已在命令行中输入一样

我保留一个哈希%data，其密钥是IP地址，值是累计的数据包总数。我还保持宽度值$w，它是到目前为止遇到的最长IP地址的长度。这在printf中用于使列整齐地对齐

输出的排序在最终的for循环

中进行了简单的排序

use strict;
use warnings 'all';

@ARGV = glob "*.log";

my %data;
my $w;

# <device>,<ip>,<city>,<packets>

while ( <> ) {
    s/\s+\z//;

    my ($ip, $count) = (split /,/)[1,3];

    $data{$ip} += $count;

    my $len = length $ip;
    $w = $len unless $w and $w >= $len;
}

for my $ip ( sort { $data{$b} <=> $data{$a} } keys %data ) {
    printf "%*s %d\n", $w, $ip, $data{$ip};
}

输出

25.0.10.61 90
25.0.30.60 80
25.0.10.62 40
25.0.20.61 30

Answer 4

我将如何做到这一点：

use strict;
use warnings;

my $packetCountByIP = {};
for (my $i = 1; $i <= 7; ++$i) {
    my $fileName = 'activity'.$i.'.log';
    my $fh;
    if (!open($fh,'<',$fileName)) { die("$fileName: $!"); }
    while (my $line = <$fh>) {
        my $fields = [split(',',$line)];
        my $ip = $fields->[1];
        my $packetCount = $fields->[3]+0;
        $packetCountByIP->{$ip} += $packetCount;
    } ## end while (log file lines)
    close($fh);
} ## end for (log files)

my $ips = [sort({$packetCountByIP->{$b}<=>$packetCountByIP->{$a}} keys(%$packetCountByIP))];
foreach my $ip (@$ips) {
    print("$ip,$packetCountByIP->{$ip}\n");
} ## end foreach (ips)

如何在Perl哈希中添加重复键的值

样本数据 - 其中一个日志的前5行：

4 个答案:

输出