如何在文件中搜索某些行,然后在Perl中仅提取该行的某些部分?

时间:2015-06-25 14:03:10

标签: perl file search

所以我有一个文本文件,它是一个服务器报告,其中包含大约1000行信息​​。我正在尝试编写一个脚本,只能在报告中搜索我正在寻找的某些信息。例如:

  

服务器1运行状况检查

     

日期 - 错误计数

           

06/25/15 14

     

6/24/15 21

     

6/23/15 17

     

6/24/15 33

     

服务器2运行状况检查

     

日期 - 错误计数

           

06/25/15 4

     

6/24/15 13

     

6/23/15 21

     

6/24/15 33

     

由X

引起的错误            

服务器1:

     

32

     

服务器2:

     

24

这三个部分是“服务器运行状况检查1”,“服务器运行状况检查2”和“由x引起的错误”。我需要提取的每个部分的数据都是粗体。有谁知道我该怎么做呢?

1 个答案:

答案 0 :(得分:0)

这是一个perl脚本:

#!/usr/bin/env perl                                                                     

use warnings;
use strict;

use constant SECTION_HEALTH_CHECK => 'Health Check';
use constant SECTION_ERRORS       => 'Errors caused by X';
use constant RE_HEALTH_CHECK_DATA => qr|^(\d+/\d+/\d+)\s+:\s+(\d+)|; # Date, Count      
use constant RE_ERRORS_DATA       => qr|^(\d+)|;                     # Count            
use constant FMT_DATE             => '%02d-%02d-%02d';

open my $fh, 'foo.txt'
or die "Unable to open 'foo.txt' : $!";  # CHANGEME                                 

my ($section, %errors_by_date, %errors_caused_by_server);
my $server = 0;

# Read through the file, line by line                                                   
while (<$fh>) {
    next unless m|\S|; # Skip empty / whitespace only lines                             
    $section =
        m|${\SECTION_HEALTH_CHECK}| ? SECTION_HEALTH_CHECK
      : m|${\SECTION_ERRORS}|       ? SECTION_ERRORS
      :                               $section;

    if (m|Server (\d+)|) {
        $server = $1;
    }

    if (SECTION_HEALTH_CHECK eq $section
        and $_ =~ RE_HEALTH_CHECK_DATA) {
        my ($date, $count) = ($1, $2);
        $errors_by_date{ $server }->{ date_to_yymmdd($date) } += $count;
    }

    if (SECTION_ERRORS eq $section
        and $_ =~ RE_ERRORS_DATA) {
        my ($count) = $1;
        $errors_caused_by_server{ $server } += $count;
    }
}

for my $server_id (sort {$a <=> $b} keys %errors_by_date) {
    print "\n--- Server $server_id ---\n\n";
    for my $date (sort keys $errors_by_date{$server_id}) {
        my $count = $errors_by_date{$server_id}->{$date};
        my $normal_date = yymmdd_to_date($date);
        print "On $normal_date there were $count errors!\n";
    }
    my $errors_count = $errors_caused_by_server{$server_id} // 0;
    next unless $errors_count;
    print "\nThere were $errors_count errors caused by this server!\n";
}

sub date_to_yymmdd {
    my ($date) = @_;
    my ($mm,$dd,$yy) = split '/', $date;
    return sprintf(FMT_DATE,$yy,$mm,$dd);
}

sub yymmdd_to_date {
    my ($date) = @_;
    my ($yy,$mm,$dd) = split '-', $date;
    return sprintf(FMT_DATE,$mm,$dd,$yy);
}

1;

输出以下内容:

--- Server 1 ---

On 06-23-15 there were 17 errors!
On 06-24-15 there were 54 errors!
On 06-25-15 there were 14 errors!

There were 32 errors caused by this server!

--- Server 2 ---

On 06-23-15 there were 21 errors!
On 06-24-15 there were 46 errors!
On 06-25-15 there were 4 errors!

There were 24 errors caused by this server!