我有一个标签分隔文件,格式为:
Business System Name: OK_CR
Serial Numbr Service Name Program Name Epoch Start Time
------------ -------------------- -------------------- -------------------
GI1001TAA266 PPV 10 (50106) We Bought A Zoo Aug 14 2012 4:15AM
GI1002TB3596 PPV 5 (50101) Help, The (2011) Aug 14 2012 6:30PM
GI1002TDH825 PPV 2 (50098) Safe House Sep 7 2012 2:15AM
Business System Name: OK_SV
Serial Numbr Service Name Program Name Epoch Start Time
------------ -------------------- -------------------- -------------------
GI1001TAA266 PPV 10 (50106) We Bought A Zoo Aug 14 2012 4:15AM
GI1002TB3596 PPV 5 (50101) Help, The (2011) Aug 14 2012 6:30PM
GI1002TDH825 PPV 2 (50098) Safe House Sep 7 2012 2:15AM
我想计算按业务系统标题分隔的日期行数,我的意思是脚本的结果应该是这样的:
Business System Name: OK_CR
Aug 14: 2
Sep 7: 1
Business System Name: OK_SV
Aug 14: 2
Sep 7: 1
到目前为止,我已经创建了一个哈希,但我很惊讶如何计算每个日期并在每个业务系统标头后重置计数器。这是我的剧本:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
open my $fh, '<', 'ppv.txt' or die $!;
my %data;
my $sect;
while (<$fh>) {
next if /^\s+/;
if (/^Business System Name:\s+(\w+)/) {
$sect = $1;
next;
}
#print "$sect\n";
if (defined $sect) {
next if /^Serial Numbr/;
next if /^------------/;
push @{ $data{$sect} }, $_;
}
}
print Dumper \%data;
这是脚本的结果:
$VAR1 = {
'OK_CR' => [
'GI1001TAA266 PPV 10 (50106) We Bought A Zoo Aug 14 2012 4:15AM
',
'GI1002TB3596 PPV 5 (50101) Help, The (2011) Aug 14 2012 6:30PM
',
'GI1002TDH825 PPV 2 (50098) Safe House Sep 7 2012 2:15AM
'
],
'OK_SV' => [
'GI1001TAA266 PPV 10 (50106) We Bought A Zoo Aug 14 2012 4:15AM
',
'GI1002TB3596 PPV 5 (50101) Help, The (2011) Aug 14 2012 6:30PM
',
'GI1002TDH825 PPV 2 (50098) Safe House Sep 7 2012 2:15AM
'
]
};
关于如何从这里前进的任何想法?
答案 0 :(得分:1)
使用unpack
,就像在评论中一样,您只需要跟踪每个日期的数字:
use strict;
use warnings;
use Data::Dumper;
open my $fh, '<', 'ppv.txt' or die $!;
my %data;
my $sect;
while (<$fh>) {
next if /^\s+/;
if (/^Business System Name:\s+(\w+)/) {
$sect = $1;
next;
}
#print "$sect\n";
if (defined $sect) {
next if /^Serial Numbr/;
next if /^------------/;
my $format = 'A57 A13 A*';
my($prefixes, $date, $suffixes) = unpack($format, $_);
$data{$sect}{$date}++;
}
}
print Dumper \%data;
__END__
$VAR1 = {
'OK_CR' => {
' Aug 14 2012' => 2,
' Sep 7 2012' => 1
},
'OK_SV' => {
' Aug 14 2012' => 2,
' Sep 7 2012' => 1
}
};
答案 1 :(得分:1)
这应该有效:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my %hash =();
open(FILE,"test.txt");
while(<FILE>)
{
if(/(Business System Name:\s+OK_\S+)\s+/)
{
if(%hash)
{
print Dumper \%hash;
%hash=();
$hash{header}=$1;
}
else
{
$hash{header}=$1;
}
}
elsif(/((Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\s+\d+\s+\d\d\d\d)/)
{
if(defined $hash{$1}){$hash{$1}++;}
else{$hash{$1}=1;}
}
}
close(FILE);
if(%hash)
{
print Dumper \%hash;
}
输出:
$VAR1 = {
'Aug 14 2012' => 2,
'Sep 7 2012' => 1,
'header' => 'Business System Name: OK_CR'
};
$VAR1 = {
'Aug 14 2012' => 2,
'Sep 7 2012' => 1,
'header' => 'Business System Name: OK_SV'
};
答案 2 :(得分:1)
以下是将Perl的记录分隔符($/
)设置为&#39;业务系统名称的另一个选项:&#39;所以你的文件作为记录在那些块中读取。它还split
\t
上的日期行,因为您的文件包含以制表符分隔的数据:
use strict;
use warnings;
use Data::Dumper;
local $/ = 'Business System Name:';
my %data;
while (<>) {
my ($sect) = /\s+(.+)/;
my @timeLines = grep /:\d\d(?:A|P)M$/, split /\n/;
for (@timeLines) {
( split /\t/ )[-1] =~ /(.+?)\s+\d+:/;
$data{$sect}{$1}++;
}
}
print Dumper \%data
用法:perl script.pl inFile [>outFile]
最后一个可选参数将输出定向到文件。
数据集输出:
$VAR1 = {
'OK_SV ' => {
'Aug 14 2012' => 2,
'Sep 7 2012' => 1
},
'OK_CR ' => {
'Aug 14 2012' => 2,
'Sep 7 2012' => 1
}
};
读取记录后,将捕获部分名称。接下来,记录的行在换行符上为split
,并且grep
仅适用于包含时间数据的行。选项卡字符上的最后for
循环split
获取最后一个字段,捕获日期信息,然后使用sect和date数据增加散列。
希望这有帮助!