根据标题和日期计算行数

时间:2013-11-06 16:33:44

标签: perl

我有一个标签分隔文件,格式为:

Business System Name:  OK_CR                      

Serial Numbr  Service Name          Program Name          Epoch Start Time     
------------  --------------------  --------------------  -------------------  
GI1001TAA266  PPV 10 (50106)        We Bought A Zoo       Aug 14 2012  4:15AM  
GI1002TB3596  PPV 5 (50101)         Help, The (2011)      Aug 14 2012  6:30PM  
GI1002TDH825  PPV 2 (50098)         Safe House            Sep  7 2012  2:15AM  

Business System Name:  OK_SV                      

Serial Numbr  Service Name          Program Name          Epoch Start Time     
------------  --------------------  --------------------  -------------------  
GI1001TAA266  PPV 10 (50106)        We Bought A Zoo       Aug 14 2012  4:15AM  
GI1002TB3596  PPV 5 (50101)         Help, The (2011)      Aug 14 2012  6:30PM  
GI1002TDH825  PPV 2 (50098)         Safe House            Sep  7 2012  2:15AM  

我想计算按业务系统标题分隔的日期行数,我的意思是脚本的结果应该是这样的:

Business System Name:  OK_CR
Aug 14: 2
Sep 7: 1

Business System Name:  OK_SV
Aug 14: 2
Sep 7: 1

到目前为止,我已经创建了一个哈希,但我很惊讶如何计算每个日期并在每个业务系统标头后重置计数器。这是我的剧本:

#!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper;

open my $fh, '<', 'ppv.txt' or die $!;

my %data;
my $sect;
while (<$fh>) {
  next if /^\s+/;
  if (/^Business System Name:\s+(\w+)/) {
    $sect = $1;
    next;
  }
  #print "$sect\n";
  if (defined $sect) {
    next if /^Serial Numbr/;
    next if /^------------/;
    push @{ $data{$sect} }, $_;
  }
}
print Dumper \%data;

这是脚本的结果:

$VAR1 = {
          'OK_CR' => [
                       'GI1001TAA266  PPV 10 (50106)        We Bought A Zoo       Aug 14 2012  4:15AM
',
                       'GI1002TB3596  PPV 5 (50101)         Help, The (2011)      Aug 14 2012  6:30PM
',
                       'GI1002TDH825  PPV 2 (50098)         Safe House            Sep  7 2012  2:15AM
'
                     ],
          'OK_SV' => [
                       'GI1001TAA266  PPV 10 (50106)        We Bought A Zoo       Aug 14 2012  4:15AM
',
                       'GI1002TB3596  PPV 5 (50101)         Help, The (2011)      Aug 14 2012  6:30PM
',
                       'GI1002TDH825  PPV 2 (50098)         Safe House            Sep  7 2012  2:15AM
'
                     ]
        };

关于如何从这里前进的任何想法?

3 个答案:

答案 0 :(得分:1)

使用unpack,就像在评论中一样,您只需要跟踪每个日期的数字:

use strict;
use warnings;
use Data::Dumper;

open my $fh, '<', 'ppv.txt' or die $!;

my %data;
my $sect;
while (<$fh>) {
  next if /^\s+/;
  if (/^Business System Name:\s+(\w+)/) {
    $sect = $1;
    next;
  }
  #print "$sect\n";
  if (defined $sect) {
    next if /^Serial Numbr/;
    next if /^------------/;
    my $format = 'A57 A13 A*';
    my($prefixes, $date, $suffixes) = unpack($format, $_);
    $data{$sect}{$date}++;
  }
}
print Dumper \%data;

__END__

$VAR1 = {
          'OK_CR' => {
                       ' Aug 14 2012' => 2,
                       ' Sep  7 2012' => 1
                     },
          'OK_SV' => {
                       ' Aug 14 2012' => 2,
                       ' Sep  7 2012' => 1
                     }
        };

答案 1 :(得分:1)

这应该有效:

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my %hash =();
open(FILE,"test.txt");
while(<FILE>)
{
    if(/(Business System Name:\s+OK_\S+)\s+/)
    {
        if(%hash)
        {
            print Dumper \%hash;
            %hash=();
            $hash{header}=$1;
        }
        else
        {
            $hash{header}=$1;
        }
    }
    elsif(/((Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+\d+\s+\d\d\d\d)/)
    {
        if(defined $hash{$1}){$hash{$1}++;}
        else{$hash{$1}=1;}
    }
}
close(FILE);
if(%hash)
{
    print Dumper \%hash;
}

输出:

$VAR1 = {
          'Aug 14 2012' => 2,
          'Sep  7 2012' => 1,
          'header' => 'Business System Name:  OK_CR'
        };
$VAR1 = {
          'Aug 14 2012' => 2,
          'Sep  7 2012' => 1,
          'header' => 'Business System Name:  OK_SV'
        };

答案 2 :(得分:1)

以下是将Perl的记录分隔符($/)设置为&#39;业务系统名称的另一个选项:&#39;所以你的文件作为记录在那些块中读取。它还split \t上的日期行,因为您的文件包含以制表符分隔的数据:

use strict;
use warnings;
use Data::Dumper;

local $/ = 'Business System Name:';
my %data;

while (<>) {
    my ($sect) = /\s+(.+)/;
    my @timeLines = grep /:\d\d(?:A|P)M$/, split /\n/;
    for (@timeLines) {
        ( split /\t/ )[-1] =~ /(.+?)\s+\d+:/;
        $data{$sect}{$1}++;
    }
}

print Dumper \%data

用法:perl script.pl inFile [>outFile]

最后一个可选参数将输出定向到文件。

数据集输出:

$VAR1 = {
          'OK_SV                      ' => {
                                             'Aug 14 2012' => 2,
                                             'Sep  7 2012' => 1
                                           },
          'OK_CR                      ' => {
                                             'Aug 14 2012' => 2,
                                             'Sep  7 2012' => 1
                                           }
        };

读取记录后,将捕获部分名称。接下来,记录的行在换行符上为split,并且grep仅适用于包含时间数据的行。选项卡字符上的最后for循环split获取最后一个字段,捕获日期信息,然后使用sect和date数据增加散列。

希望这有帮助!