Perl-如何跳过上一处理中已读取的行

时间:2018-08-01 14:50:13

标签: perl skip log-rotation

我对以下Perl程序有问题,该程序可用于重新组织执行对应用程序访问的跟踪。

我已经使用跳转行功能实现了以下解决方案,因为将来我可能会拥有10个或更多的轮换文件,每个文件大小为50MB。

我想跳过先前处理中已经读取的行(如果文件的inode尚未更改),这样,我将仅使用增量。

我希望这段代码可以帮助其他用户。

#!/usr/bin/perl

use strict;
use warnings 'all';

use File::Path qw<mkpath>;
use File::Spec;
use File::Copy;
use POSIX qw<strftime>;
use English;

# Dynamic Variables
my %older_count;
my %older_inode;
my @newer_filelist;
my @events;

my $OLD_IN_FILE = "";

# Static Variables
# Directories
my $IN_DIR               = "/tmp/appo/log";    # Input Directories
my $OUTPUT_LOG_DIRECTORY = "/tmp/appo/A14";    # Output directory

# Files
my $SPLITTED_OUTFILE = "parse_log.csv";           # Splitted by month output file
my $R_STATS          = ".rotation_statistics";    # Rotation Statistic file

## MAIN

# Loading old statistics
if (-e $R_STATS) { 
   open (STAT_FILE, $R_STATS) or die $!;

    while ( <STAT_FILE> ) {
       my @lines = split /\n/;
       my ( $file, $inode, $nrows ) = $lines[0] =~ /\A(.\w.*);(\d.*);(\d.*)/;    # Encapsulate values

       push @{ $older_count{$file} }, $nrows;
       push @{ $older_inode{$file} }, $inode;
   }

   close( STAT_FILE );
}

# Loading new events from log
foreach my $INPUT ( glob( "$IN_DIR/logrotate_*.log" ) ) {

    my $inode        = ( stat( $INPUT ) )[1];
    my $currentinode = $older_inode{$INPUT}[0];

    my $jumprow = 0;
    $jumprow = $older_count{$INPUT}[0] if $currentinode == $inode; 

# Get current file stastistics
   if ( $INPUT ne $OLD_IN_FILE ) {
       my $count = ( split /\s+/, `wc -l $INPUT` )[0];
       push @newer_filelist, {
             filename => $INPUT,
             inode    => $inode,
             count    => $count
       };
    }

    # Log opening
    open my $fh, '<', $INPUT or die "can't read open '$INPUT': $OS_ERROR";

    $/ = "\n\n";    # record separator

    while ( <$fh> ) {

        # next unless $. > $jumprow; # This instruction doesn't work

        # Log processing
        my @lines = split /\n/;
        my $i     = 0;

        foreach my $lines ( @lines ) {

            # Take only Authentication rows and skip others
            if ( $lines[$i] =~ m/\A#\d.\d.+#\d{4}\s\d{2}\s\d{2}\s\d{2}:\d{2}:\d{2}:\d{3}#\+\d+#\w+#\/\w+\/\w+\/Authentication/ ) {

                # Shows only LOGIN/LOGOUT access type and exclude GUEST users
                if ( $lines[ $i + 2 ] =~ m/Login/ || $lines[ $i + 2 ] =~ m/Logout/ && $lines[ $i + 3 ] !~ m/Guest/ ) {

                    my ( $y, $m, $d, $time ) = $lines[$i] =~ /\A#\d.\d.+#(\d{4})\s(\d{2})\s(\d{2})\s(\d{2}:\d{2}:\d{2}:\d{3})/;

                    my ( $action ) = $lines[ $i + 2 ] =~ /(\w+)/;
                    my ( $user )   = $lines[ $i + 3 ] =~ /\w+:\s(.+)/;

                    push @events, {
                        date   => "$y/$m/$d",
                        time   => $time,
                        action => $action,
                        user   => $user
                    };  # Array loader
                }
            }
            else {
                next;
            }

            $i++;
        }

        $OLD_IN_FILE = $INPUT;
    }
    close( $fh );
}

# Print Log statistics for futher elaborations
open( STAT_FILE, '>', $R_STATS ) or die $!;

foreach my $my_filelist ( @newer_filelist ) {
    print STAT_FILE join ';', $my_filelist->{filename}, $my_filelist->{inode}, "$my_filelist->{count}\n";
}

close( STAT_FILE );

my @by_user = sort { $a->{user} cmp $b->{user} } @events;    # Sorting by users

foreach my $my_list ( @by_user ) {

    my ( $y, $m ) = $my_list->{date} =~ /(\d{4})\/(\d{2})/;

    # Generate Directory YYYY-Month - #2009-January
    my $directory = File::Spec->catfile( $OUTPUT_LOG_DIRECTORY, "$m-$y" );

    unless ( -e $directory ) {
        mkpath( $directory, { verbose => 1 } );
    }

    my $log_file_path = File::Spec->catfile( $directory, $SPLITTED_OUTFILE );

    open( OUTPUT, '>>', $log_file_path ) or die $!;
    print OUTPUT join ';', $my_list->{date}, $my_list->{time}, $my_list->{action}, "$my_list->{user}\n";
}

close( OUTPUT );

我的日志文件是

logrotate_1.0.log

#2.0^H#2018 05 29 10:09:45:969#+0200#Info#/Sys/Sec/Authentication#
#BC-JAS-SEC#security#C0000A7103EC9E50000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER1#5##C47731E44D00000bae##0#Thread[HTTP Worker [@1473726842],5,Dedicated_Application_Thread]#Plain##
Login
User: USER4
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 05 29 11:51:06:541#+0200#Info#/Sy/Sec/Authentication#
#BC-JAS-SEC#security#C0000A7103EC9F50000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER4#6##A40B81404D03c0bae##0#Thread[HTTP Worker [@1264376989],5,Dedicated_Application_Thread]#Plain##
Login
User: USER1
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 05 30 11:54:03:906#+0200#Info#/Sy/Sec/Informtion#
#BC-JAS-SEC#security#C0000A7103EC9F50000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER4#6##A40B81404D03c0bae##0#Thread[HTTP Worker [@1264376989],5,Dedicated_Application_Thread]#Plain##
Login
User: USER4
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 05 30 11:59:59:156#+0200#Info#/Sys/Sec/Authentication#
#BC-JAS-SEC#security#C0000A7103ECA0C#3935150000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER3#7##9ACF7Ec0bae##0#Thread[HTTP Worker [@124054179],5,Dedicated_Application_Thread]#Plain##
Logout
User: USER3
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 05 30 08:32:11:348#+0200#Warn#/Sys/Sec/Authentication#
#BC-JAS-SEC#security#C0000A7103ECA20E0000508C#3935150000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER2#03c0bae##0#Thread[HTTP Worker [@2033389552],5,Dedicated_Application_Thread]#Plain##
Login
User: USER4
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 05 30 11:09:54:978#+0200#Info#/Sys/Sec/Information#
#BC-JAS-SEC#security#C0000A7103ECA20E0000508C#3935150000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER2#03c0bae##0#Thread[HTTP Worker [@2033389552],5,Dedicated_Application_Thread]#Plain##
Login
User: USER2
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 06 01 08:11:30:008#+0200#Warn#/Sys/Sec/Authentication#
#BC-JAS-SEC#security#C0000A7103ECA20050000508C#3935150000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER2#0##E0E##0#Thread[HTTP Worker [@2033389552],5,Dedicated_Application_Thread]#Plain##
Logout
User: USER2
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 06 01 11:11:29:658#+0200#Info#/Sys/Sec/Information#
#BC-JAS-SEC#security#C0000A7103ECA20050000508C#3935150000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER2#0##E0E##0#Thread[HTTP Worker [@2033389552],5,Dedicated_Application_Thread]#Plain##
Logout
User: USER1
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 06 02 12:00:00:254#+0200#Info#/Sys/Sec/Authentication#
#BC-JAS-SEC#security#C0000A7103ECA20050000508C#3935150000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER2#0##E0E##0#Thread[HTTP Worker [@2033389552],5,Dedicated_Application_Thread]#Plain##
Logout
User: Guest
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 06 02 12:05:00:465#+0200#Warn#/Sys/Sec/Authentication#
#BC-JAS-SEC#security#C0000A7103ECA20050000508C#3935150000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER2#0##E0E##0#Thread[HTTP Worker [@2033389552],5,Dedicated_Application_Thread]#Plain##
Logout
User: USER9
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 06 02 12:50:00:065#+0200#Warn#/Sys/Sec/Authentication#
#BC-JAS-SEC#security#C0000A7103ECA20050000508C#3935150000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER2#0##E0E##0#Thread[HTTP Worker [@2033389552],5,Dedicated_Application_Thread]#Plain##
Login
User: USER9
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 05 24 10:43:38:683#+0200#Info#/Sys/Sec/Authentication#
#BC-JAS-SEC#security#C0000A7103EC9E50000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER1#5##C47731E44D00000bae##0#Thread[HTTP Worker [@1473726842],5,Dedicated_Application_Thread]#Plain##
Login
User: USER1
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

logrotate_0.0.log

#2.0^H#2018 05 24 11:05:04:011#+0200#Info#/Sy/Sec/Authentication#
#BC-JAS-SEC#security#C0000A7103EC9F50000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER4#6##A40B81404D03c0bae##0#Thread[HTTP Worker [@1264376989],5,Dedicated_Application_Thread]#Plain##
Login
User: USER4
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 05 24 11:04:59:410#+0200#Info#/Sy/Sec/Informtion#
#BC-JAS-SEC#security#C0000A7103EC9F50000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER4#6##A40B81404D03c0bae##0#Thread[HTTP Worker [@1264376989],5,Dedicated_Application_Thread]#Plain##
Login
User: USER4
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 05 24 11:05:07:100#+0200#Info#/Sys/Sec/Authentication#
#BC-JAS-SEC#security#C0000A7103ECA0C#3935150000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER3#7##9ACF7Ec0bae##0#Thread[HTTP Worker [@124054179],5,Dedicated_Application_Thread]#Plain##
Logout
User: USER3
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 05 24 11:07:21:314#+0200#Warn#/Sys/Sec/Authentication#
#BC-JAS-SEC#security#C0000A7103ECA20E0000508C#3935150000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER2#03c0bae##0#Thread[HTTP Worker [@2033389552],5,Dedicated_Application_Thread]#Plain##
Login
User: USER2
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 05 24 11:07:21:314#+0200#Info#/Sys/Sec/Information#
#BC-JAS-SEC#security#C0000A7103ECA20E0000508C#3935150000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER2#03c0bae##0#Thread[HTTP Worker [@2033389552],5,Dedicated_Application_Thread]#Plain##
Login
User: USER2
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 05 26 10:48:02:458#+0200#Warn#/Sys/Sec/Authentication#
#BC-JAS-SEC#security#C0000A7103ECA20050000508C#3935150000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER2#0##E0E##0#Thread[HTTP Worker [@2033389552],5,Dedicated_Application_Thread]#Plain##
Logout
User: USER2
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

#2.0^H#2018 05 28 10:00:25:000#+0200#Info#/Sys/Sec/Information#
#BC-JAS-SEC#security#C0000A7103ECA20050000508C#3935150000000004#common.com/irj#com.common.services.security.authentication.logincontext.table#USER2#0##E0E##0#Thread[HTTP Worker [@2033389552],5,Dedicated_Application_Thread]#Plain##
Logout
User: USER0
IP Address: 127.0.0.1
Authentication Stack: ticket
Authentication Stack Properties:

使用第54行的语句时遇到问题:

#next unless $. > $jumprow;

我认为这是行不通的,因为我使用了以下记录分隔符,但是我不明白我必须使用哪种分隔符来解决此问题:

$/ = "\n\n";  # record separator

为调试代码,我插入了以下语句:

  

打印“下一个,除非$。> $ jumprow \ n”;

如我所见,$。值与文件的行号不同(原因是记录分隔符带有双换行---> $ / =“ \ n \ n”;)

如果我删除了双换行,则脚本不起作用

我的脚本的详细信息: (1)第一步: 读取STAT_FILE以查看上次运行中读取的行

(2)第二步: 我将日期,时间,操作(登录或注销)和用户(如果不是来宾)封装到一个数组(@events)中。 我按用户排序数组(默认情况下不按日期排序)。

(3)第三步: 我将读取的日志文件的信息打印到STAT_FILE中

(4)第四步: 我将排序后的@event数组写入名为MM-YYYY的目录中的parse_log.csv文件中(取决于事件的日期)。

能帮我为我的脚本找到解决方案吗?

1 个答案:

答案 0 :(得分:2)

我以为我们昨天讲完了。

if ( $currentinode == $inode ) {
    # Get rows to jump for this $INPUT
    my $jumprow = $older_count{$INPUT}[0];
}
else {
    # If file has been changed
    my $jumprow = 0;
}

每个块都声明一个 new $jumprow变量。当您退出声明它们的代码块时(即在下一行),这些变量中的每一个都不存在。

如果要在if / else块之外访问这些变量,则需要在更高级别进行声明。

my $jumprow;
if ( $currentinode == $inode ) {
    # Get rows to jump for this $INPUT
    $jumprow = $older_count{$INPUT}[0];
}
else {
    # If file has been changed
    $jumprow = 0;
}

或者(更简单地说):

my $jumprow = 0;
$jumprow = $older_count{$INPUT}[0] if $currentinode == $inode;

my $jumprow = $currentinode == $inode ? $older_count{$INPUT}[0] : 0;