我有一个名为@mytitles
的数组,其中包含许多标题,例如title1
,title2
等等。我有一个名为“Superdataset
”的文件,其中包含与每个标题相关的信息。但是,与title1
相关的信息可能为6行,而title2
的信息可能为30行(其随机)。每条信息(titlex
)都以“Reading titlex
”开头,以“Done reading titlex
”结尾。
从每个标题的这些信息行中,我需要提取一些数据。我认为幸运的是,我需要的这些数据是每次“Done reading titlex
”之前的2行
所以我的“Superdataset
”看起来像是:
Reading title1 random info line1 random info line2 random info line3 random info line4 random info line5 my earnings are 6000 my expenses are 1000 Done reading title1 Reading title2 random info line6 random info line7 random info line8 random info line9 random info line10 random info line11 random info line12 random info line13 random info line14 my earnings are 11000 my expenses are 9000 Done reading title2
我需要总支出和总收入。有什么建议?
PS-数组具有复杂的名称,而不是像titlex
答案 0 :(得分:0)
这是将数据压缩成可用形式的第一步。
use warnings;
use strict;
use autodie;
my $input_filename = 'example';
open my $input, '<', $input_filename;
my %data;
{
my $current_title;
while(<$input>){
chomp;
if( /^Reading (.*?)\s*$/ ){ # start of section
$current_title = $1;
}elsif( not defined $current_title ){ # outside of any section
# invalid data
}elsif( /^Done reading (.*)/ ){ # end of section
die if $1 ne $current_title;
$current_title = undef;
}else{ # add an element of section to array
push @{ $data{$current_title} }, $_;
}
}
}
close $input;
使用创建的数据结构确定总收入和费用。
my( $earnings, $expenses );
for my $list( values %data ){
for( @$list ){
if( /earnings are (\d+)/ ){
$earnings += $1;
}elsif( /expenses are (\d+)/ ){
$expenses += $1;
}
}
}
print "earnings $earnings\n";
print "expenses $expenses\n";
而是以对计算机更有用的形式打印出来。
use YAML 'Dump';
print Dump \%data;
--- title1: - ' random info line1' - ' random info line2' - ' random info line3' - ' random info line4' - ' random info line5' - ' my earnings are 6000' - ' my expenses are 1000' title2: - ' random info line6' - ' random info line7' - ' random info line8' - ' random info line9' - ' random info line10' - ' random info line11' - ' random info line12' - ' random info line13' - ' random info line14' - ' my earnings are 11000' - ' my expenses are 9000'
答案 1 :(得分:0)
使用'range'操作符可以:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my $begin_stanza = qr/^Reading/i;
my $endof_stanza = qr/^Done reading/i;
my ( $title, @lines );
my ( $value, $total_earnings, $total_expenses );
while (<DATA>) {
chomp;
if ( m{$begin_stanza} .. m{$endof_stanza} ) {
if ( m{$begin_stanza\s+(.+)} ) {
$title = $1;
@lines = ();
next;
}
if ( m{$endof_stanza} ) {
($value) = ( $lines[0] =~ m{(\d+)} );
$total_earnings += $value;
($value) = ( $lines[1] =~ m{(\d+)} );
$total_expenses += $value;
print join "\n", $title, @lines, "\n";
next;
}
shift @lines if @lines == 2;
push @lines, $_;
}
}
printf "Total Earnings = %7d\n", $total_earnings;
printf "Total Expenses = %7d\n", $total_expenses;
__DATA__
Reading title1
random info line1
random info line2
random info line3
random info line4
random info line5
my earnings are 6000
my expenses are 1000
Done reading title1
Reading title2
random info line6
random info line7
random info line8
random info line9
random info line10
random info line11
random info line12
random info line13
random info line14
my earnings are 11000
my expenses are 9000
Done reading title2
......产生:
title1
my earnings are 6000
my expenses are 1000
title2
my earnings are 11000
my expenses are 9000
Total Earnings = 17000
Total Expenses = 10000
答案 2 :(得分:0)
除非你能预测相关线之前的线,否则触发器操作器不会通过优化来做很多事情。我认为使用缓冲区数组更容易,只需在收入和费用之后匹配该行。
#!/usr/bin/perl
use strict;
use warnings;
my @buffer;
my ($earnings, $expenses);
for my $line (<DATA>) {
shift @buffer if @buffer > 2;
push @buffer, $line;
next if $line !~ /^Done reading/;
$earnings += $1 if $buffer[0] =~ /(\d+)$/;
$expenses += $1 if $buffer[1] =~ /(\d+)$/;
}
print "Total earnings: $earnings\n";
print "Total expenses: $expenses\n";
__DATA__
Reading title1
random info line1
random info line2
random info line3
random info line4
random info line5
my earnings are 6000
my expenses are 1000
Done reading title1
Reading title2
random info line6
random info line7
random info line8
random info line9
random info line10
random info line11
random info line12
random info line13
random info line14
my earnings are 11000
my expenses are 9000
Done reading title2
输出:
Total earnings: 17000
Total expenses: 10000