需要perl脚本来查找在web日志中找到的图像数量

时间:2012-10-31 04:47:44

标签: perl web counter

我有网络日志文件,我遇到了很多麻烦,是perl的新手。我只需要一个脚本来查找找到的每个图像的计数。我能够列出它们,但我不确定如何获得计数,比如说“有x jpgs和x gifs被查看过”。

到目前为止我的代码看起来像这样:

use warnings;
open FILE, "jan28.log";
while ($line = <FILE>) { 

    if ($line =~ /.jpg/) {

        print $line;
    } 
    elsif ($line =~ /.gif/) {

        print $line;
    }
    elsif ($line =~ /tiff/) {

        print $line; 
    }
}

网络日志看起来像这样。

24.131.83.162 - - [28/Jan/2007:00:00:00 -0500] "GET /~taler/images/index_09.jpg   HTTP/1.1" 200 1563
207.46.98.53 - - [28/Jan/2007:00:00:04 -0500] "GET /%7Edist/programs/PhD/PhDGuide/guideA.htm HTTP/1.0" 200 19090
74.6.74.184 - - [28/Jan/2007:00:00:12 -0500] "GET /%7Embsclass/hall_of_fame/myicon.ico HTTP/1.0" 200 760
58.68.24.3 - - [28/Jan/2007:00:00:16 -0500] "GET /~dtipper/tipper.html HTTP/1.1" 200 5896
58.68.24.3 - - [28/Jan/2007:00:00:16 -0500] "GET /~dtipper/gifs/head.jpg HTTP/1.1" 200 18318

3 个答案:

答案 0 :(得分:2)

use strict;
use warnings;
use feature qw( say );
use URI qw( );

my $jpegs = 0;
my $gifs  = 0;
while (<>) {
   chomp;
   my ($req, $code) = /^(?:\S+\s+){3}\[[^\]]*\] "([^"]*)"\s*(\S+)/
      or next;

   $code >= 200 && $code < 300
      or next;

   my ($meth, $url) = split(' ', $req);
   $url = URI->new($url, 'http');

   my $path = $url->path;
   if    ($path =~ /\.jpe?g\z/i) { ++$jpegs; }
   elsif ($path =~ /\.gif\z/i  ) { ++$gifs; }
}

say "There were $jpegs jpgs and $gifs gifs viewed";

答案 1 :(得分:0)

尝试这样做(在shell中):

perl -wane '
    END{
        print "there\047s was $hash{$_} items for $_\n" for sort keys %hash;
    }

    $key = $1 if m!.*\.(jpe?g|gif|ico)\b!i;
    $hash{$key}++
' filename.txt

如果您想要一个具有相同逻辑的真实脚本,Deparse模块将有助于:

$ perl -MO=Deparse -wane '
END{
    print "there\047s was $hash{$_} items for $_\n" for sort keys %hash;
}

$key = $1 if m!.*\.(jpe?g|gif|ico)\b!i;
$hash{$key}++
' filename.txt

“Deparsed”结果脚本:

BEGIN { $^W = 1; }
LINE: while (defined($_ = <ARGV>)) {
    our(@F) = split(' ', $_, 0);
    sub END {
        print "there's was $hash{$_} items for $_\n" foreach (sort keys %hash);
    }
    $key = $1 if /.*\.(jpe?g|gif|ico)\b/i;
    ++$hash{$key};
}
-e syntax OK

答案 2 :(得分:0)

这是一个基本示例,但CPAN中可能有一些Log Parser模块。

use File::Open::OOP qw(oopen);
use Data::Dump qw(dump);

my $fh = oopen 'log';
my %hash;
while ( my $row = $fh->readline ) {
  $row =~ s/.*\"GET\ \/.*\.(\w+)\ .*\n$/$1/;
  $ext = $row;
  $hash{$ext} += 1;
}
dump(%hash);

样品的输出:

$ perl script.pl

(“html”,1,“ico”,1,“jpg”,2,“htm”,1)

$