如何在perl中合并来自两个不同文件的列

时间:2012-12-15 10:31:20

标签: perl

我编写了以下perl代码来读取文本文件(a1.txt)并平均时间戳。我想同时读取两个文件(a1.txt和a2.txt)并合并两个文件中的所有列。

以下代码一次只能读取一个文件。请帮我修改我的下面的Perl代码,并按以下格式提供输出。

a1.txt

PERFORMANCE TESTING


-------------------------------------------------------------------
PERF_SMK_OCUS_50    Version P-20-17
-------------------------------------------------------------------
300_wireframe_view_redraws_(GR) 00:01:56

80_wireframe_view_redraws_with_DATUMS_on_(GR) 00:00:51

3_hidden_view_redraws_(GR) 00:01:35

6_Fast_HLR_activations_(CP) 00:01:10

120_hidden_view_redraws_with_Fast_HLR_(GR) 00:00:42

2_shaded_mouse_spins_(GR) 00:00:21

270_shaded_view_redraws_(GR) 00:01:39
-------------------------------------------------------------------

****************************************************
****************************************************
-------------------------------------------------------------------
PERF_SMK_OCUS_50    Version P-20-17
-------------------------------------------------------------------
300_wireframe_view_redraws_(GR) 00:01:56

80_wireframe_view_redraws_with_DATUMS_on_(GR) 00:00:51

3_hidden_view_redraws_(GR) 00:01:35

6_Fast_HLR_activations_(CP) 00:01:09

120_hidden_view_redraws_with_Fast_HLR_(GR) 00:00:42

2_shaded_mouse_spins_(GR) 00:00:20

270_shaded_view_redraws_(GR) 00:01:39
-------------------------------------------------------------------

****************************************************
****************************************************
-------------------------------------------------------------------
PERF_SMK_OCUS_50    Version P-20-17
-------------------------------------------------------------------
300_wireframe_view_redraws_(GR) 00:01:55

80_wireframe_view_redraws_with_DATUMS_on_(GR) 00:00:50

3_hidden_view_redraws_(GR) 00:01:34

6_Fast_HLR_activations_(CP) 00:01:09

120_hidden_view_redraws_with_Fast_HLR_(GR) 00:00:40

2_shaded_mouse_spins_(GR) 00:00:21

270_shaded_view_redraws_(GR) 00:01:35
-------------------------------------------------------------------

****************************************************
****************************************************

a2.txt

PERFORMANCE TESTING

-------------------------------------------------------------------
PERF_SMK_OCUS_50    Version P-20-17
-------------------------------------------------------------------
80_wireframe_view_redraws_with_DATUMS_on_(GR) 00:00:50

3_hidden_view_redraws_(GR) 00:01:37

6_Fast_HLR_activations_(CP) 00:01:12

120_hidden_view_redraws_with_Fast_HLR_(GR) 00:00:43

2_shaded_mouse_spins_(GR) 00:00:21

270_shaded_view_redraws_(GR) 00:01:35

240_realtime_rendered_redraws_(GR)_1 00:13:16
-------------------------------------------------------------------

****************************************************
****************************************************
-------------------------------------------------------------------
PERF_SMK_OCUS_50    Version P-20-17
-------------------------------------------------------------------
80_wireframe_view_redraws_with_DATUMS_on_(GR) 00:00:50

3_hidden_view_redraws_(GR) 00:01:37

6_Fast_HLR_activations_(CP) 00:01:12

120_hidden_view_redraws_with_Fast_HLR_(GR) 00:00:42

2_shaded_mouse_spins_(GR) 00:00:20

270_shaded_view_redraws_(GR) 00:01:40

240_realtime_rendered_redraws_(GR)_1 00:13:14
-------------------------------------------------------------------

****************************************************
****************************************************
-------------------------------------------------------------------
PERF_SMK_OCUS_50    Version P-20-17
-------------------------------------------------------------------
80_wireframe_view_redraws_with_DATUMS_on_(GR) 00:00:50

3_hidden_view_redraws_(GR) 00:01:37

6_Fast_HLR_activations_(CP) 00:01:12

120_hidden_view_redraws_with_Fast_HLR_(GR) 00:00:44

2_shaded_mouse_spins_(GR) 00:00:20

270_shaded_view_redraws_(GR) 00:01:40

240_realtime_rendered_redraws_(GR)_1 00:13:24
-------------------------------------------------------------------

****************************************************
****************************************************

期望的输出:

> Test Cases                                  a1.txt timestamp (hh:mm:ss)      a2.txt(hh:mm:ss)      delta (a1 -a2)(hh:mm:ss)
>----------------------------------------------------------------------------------------------------------------
>240_realtime_rendered_redraws_(GR)_1           N/A                            00:13:18             N/A

> 3_hidden_view_redraws_(GR)                     00:01:34                       00:01:37           -00:00:03

> 270_shaded_view_redraws_(GR)                   00:01:37                       00:01:38           -00:00:01

> 120_hidden_view_redraws_with_Fast_HLR_(GR)     00:00:41                       00:00:43           -00:00:02

> 300_wireframe_view_redraws_(GR)                00:01:55                        N/A                 N/A 

> 2_shaded_mouse_spins_(GR)                      00:00:20                       00:00:20            00:00:00

> 6_Fast_HLR_activations_(CP)                    00:01:09                       00:01:12           -00:00:03 

> 80_wireframe_view_redraws_with_DATUMS_on_(GR)  00:00:50                       00:00:50            00:00:00

我的代码:

my %retrieve;
my $count = 0;

my $file1 = 'a1.txt';

open (R, $file1) or  die ("Could not open $file1!");

while (<R>) {

    next unless /^*Retrieve_generic_/ ||
                /^*Retrieve_assembly_1_/ ||
                /^*Retrieve_assembly_2_/ ||
                /^*300_wireframe_view_/ || 
                /^*80_wireframe_view_/ ||
                /^*3_hidden_view_/ || 
                /^*Fast_HLR_/ || 
                /^*120_hidden_view_/ ||
                /^*shaded_view_/ ||
                /^*shaded_mouse_/ || 
                /^*realtime_rendered_/;
    $count++;
    my ( $retrieve, $time ) = split;
    my ( $h, $m, $s ) = split ':', $time;
    $retrieve{$retrieve} += $h * 3600 + $m * 60 + $s;

}
close(R);

for my $retrieve ( keys %retrieve ) {

    my $hms = secondsToHMS($retrieve{$retrieve} / ( 3));
    print "$retrieve\t$hms\n" if defined $hms;
}

# For seconds < 86400, else undef returned

sub secondsToHMS {
    my $seconds = $_[0]; 
    return undef if $seconds >= 86400;

    my $h = int $seconds / 3600;
    my $m = int( $seconds - $h * 3600 ) / 60;
    my $s = $seconds % 60;

    return sprintf( '%02d:%02d:%02d', $h, $m, $s );
}

2 个答案:

答案 0 :(得分:2)

以下是我将如何做到这一点。

#!/usr/bin/perl -Tw

use strict;
use warnings;
use English qw( -no_match_vars $OS_ERROR );

die 'expecting two filenames as arguments'
    if @ARGV != 2;

my @ids;

my %time_for;

for my $filename (@ARGV) {

    my $id;

    if ( $filename =~ m{\A ( .+? / )?( [^/.]+? )( [.] \w+ ) \z}xms ) {
        my $path = $1 || "";
        my $name = $2;
        my $ext  = $3 || "";
        $id       = $name;
        $filename = "$path$name$ext";
        push @ids, $id;
    }

    die "cant parse file ID from $filename"
        if !$id;

    die "cant find $filename"
        if !stat $filename;

    open my $fh, '<', "$filename"
        or die "open $filename: $OS_ERROR";

    while ( my $line = <$fh> ) {

        if ( $line =~ m{\A ( \w+ \( \w+ \) \w* ) \s+ ( \d+:\d+:\d+ ) }xms ) {

            my ( $subject, $hms ) = ( $1, $2 );

            my $seconds = hms_to_sec( $hms );

            $time_for{$subject}->{$id} ||= $seconds;

            $time_for{$subject}->{$id}
                = ( $seconds + $time_for{$subject}->{$id} ) / 2;
        }
    }

    close $fh,
        or die "close $filename: $OS_ERROR";
}

print <<"HEAD";
> Test Cases                                     $ids[0] timestamp (hh:mm:ss)          $ids[1] (hh:mm:ss)         delta ($ids[0]-$ids[1])(hh:mm:ss)
> ------------------------------------------------------------------------------------------------------------------------------
HEAD

for my $subject (sort keys %time_for) {

    my ( $a1, $a2 ) = @{ $time_for{$subject} }{@ids};

    my $delta = defined $a1 && defined $a2 ? $a1 - $a2 : undef;

    printf "> % -46s % -32s % -21s %s\n\n",
        $subject,
        sec_to_hms( $a1 ),
        sec_to_hms( $a2 ),
        sec_to_hms( $delta );
}

sub hms_to_sec {
    my ( $h, $m, $s ) = map { int $_ } map { $_ ? $_ : 0 } split /:/, $_[0];
    return $h * 3_600 + $m * 60 + $s;
}

sub sec_to_hms {
    my ( $s ) = @_;

    return 'N/A'
        if !defined $s || $s > 86_400;

    my $sign = ' ';

    if ( $s < 0 ) {
        $sign = '-';
        $s *= -1;
    }

    my $h = int $s / 3_600;
    my $m = int ( $s - $h * 3_600 ) / 60;

    return sprintf '%s%02d:%02d:%02d', $sign, $h, $m, $s % 60;
}

输出就像这样。

> Test Cases                                     a1.txt timestamp (hh:mm:ss)      a2.txt(hh:mm:ss)      delta (a1 -a2)(hh:mm:ss)
> ------------------------------------------------------------------------------------------------------------------------------
> 120_hidden_view_redraws_with_Fast_HLR_(GR)      00:00:41                         00:00:43             -00:00:02

> 240_realtime_rendered_redraws_(GR)_1           N/A                               00:13:19             -00:13:19

> 270_shaded_view_redraws_(GR)                    00:01:37                         00:01:38             -00:00:01

> 2_shaded_mouse_spins_(GR)                       00:00:20                         00:00:20              00:00:00

> 300_wireframe_view_redraws_(GR)                 00:01:55                        N/A                    00:01:55

> 3_hidden_view_redraws_(GR)                      00:01:34                         00:01:37             -00:00:02

> 6_Fast_HLR_activations_(CP)                     00:01:09                         00:01:12             -00:00:02

> 80_wireframe_view_redraws_with_DATUMS_on_(GR)   00:00:50                         00:00:50              00:00:00

假设文件名使用/作为路径分隔符。 (适当的可移植实现可能是另一个问题的主题。)

你可以这样称呼:

./merge_columns.pl /some/path/a1.txt /another/path/a2.txt

我希望这有用。

答案 1 :(得分:0)

试试这个......

#!/usr/bin/perl -w

use strict;

sub t2i {
    my @v=split(":",$_[0]);
    return $v[0]*3600+$v[1]*60+$v[2];
};
sub i2t {
    return sprintf "%02d:%02d:%02d", $_[0]/3600,$_[0]/60%60,$_[0]%60;
};
my %hash;

foreach my $file (qw|a1 a2|) {
    open my $fh,"<".$file.".txt" or die;
    while (<$fh>) {
    $hash{$1}{$file}=t2i($2) if
        /^(\d+_\S+_\S+_\S+)\s(\d+:\d+:\d+)/;
    };
    close $fh;
};
map {
    printf "%-50s %s  %s  %s\n", $_,
        i2t($hash{$_}{'a1'}), i2t($hash{$_}{'a1'}),
        i2t($hash{$_}{'a1'} - $hash{$_}{'a2'}) if
        defined($hash{$_}{'a1'}) && defined($hash{$_}{'a2'});
} keys %hash;

那给:

80_wireframe_view_redraws_with_DATUMS_on_(GR)      00:00:50  00:00:50  00:00:00
2_shaded_mouse_spins_(GR)                          00:00:21  00:00:21  00:00:01
270_shaded_view_redraws_(GR)                       00:01:35  00:01:35  00:00:55
3_hidden_view_redraws_(GR)                         00:01:34  00:01:34  00:00:57
120_hidden_view_redraws_with_Fast_HLR_(GR)         00:00:40  00:00:40  00:00:56
6_Fast_HLR_activations_(CP)                        00:01:09  00:01:09  00:00:57

或者排序和更好的分手:

#!/usr/bin/perl -w

use strict;
my %joinHash;
my %files=('a'=>'a1.txt','b'=>'a2.txt');

sub readFile {
    open my $fh,"<".$files{$_[0]} or die;
    while (my $line=<$fh>) {
    $joinHash{$1}{$_[0]}=timeToInteger($2) if
        $line =~ /^(\d+_\S+_\S+_\S+)\s(\d+:\d+:\d+)/;
    };
    close $fh;
};
sub timeToInteger {
    my ($hour,$mins,$secs)=split(":",$_[0]);
    return $hour*3600+$mins*60+$secs;
};
sub integerToTime {
    return sprintf "%02d:%02d:%02d", $_[0]/3600,$_[0]/60%60,$_[0]%60;
};

foreach my $fileKey (keys %files) { readFile $fileKey };

map {
    my ($aVal,$bVal)=(0,0);
    $aVal=$joinHash{$_}{'a'} if defined $joinHash{$_}{'a'};
    $bVal=$joinHash{$_}{'b'} if defined $joinHash{$_}{'b'};
    printf "%-50s %s  %s  %s\n", $_,
        integerToTime($aVal), integerToTime($bVal),
        integerToTime($aVal-$bVal);
} sort {
    (my $x=$a)=~s/_.*$//g;
    (my $y=$b)=~s/_.*$//g;
    $x<=>$y
} keys %joinHash;

给出数字排序输出(空填充空值)

2_shaded_mouse_spins_(GR)                          00:00:21  00:00:20  00:00:01
3_hidden_view_redraws_(GR)                         00:01:34  00:01:37  00:00:57
6_Fast_HLR_activations_(CP)                        00:01:09  00:01:12  00:00:57
80_wireframe_view_redraws_with_DATUMS_on_(GR)      00:00:50  00:00:50  00:00:00
120_hidden_view_redraws_with_Fast_HLR_(GR)         00:00:40  00:00:44  00:00:56
240_realtime_rendered_redraws_(GR)_1               00:00:00  00:13:24  00:47:36
270_shaded_view_redraws_(GR)                       00:01:35  00:01:40  00:00:55
300_wireframe_view_redraws_(GR)                    00:01:55  00:00:00  00:01:55

编辑3 完全可用的工具!

现在有一个工具可以用文件作为参数运行,有些开关用于排序控制

#!/usr/bin/perl -w
# Demo of parsing via hash variable
# using Getopt and different sort methods
# (C) 2012 F-Hauri.ch - Use, copy , distribute or modify via License LGPL V3.

use strict;
use Getopt::Std;

my $formatString="> %-45s%-20s%-20s%s\n";
my @files=qw|a1.txt a2.txt|;
my %opt;
my %joinHash;

sub usage {
    print <<eousage ;
Usage: $0 [-a|-b|-r|-c|-n] [file1] [file2]
    -a Sort by file A times
    -b Sort by file B times
    -r Sort by result times
    -c Sort alphabeticaly by case name
    -C Sort alphabeticaly by case name (Case insensitive)
    -n Sort numericaly by case num (default)
    -R Reverse sort order
    file1 and file2 are by default: '$files[0]' and '$files[1]'.
eousage
exit 0;
}
sub mydie {
    printf STDERR "Error: %s\n",$_[0];
    usage();
}
sub readFile {
    open my $fh,"<".$files[$_[0]] or mydie "Can't open '$files[$_[0]]'.";
    while (my $line=<$fh>) {
    $joinHash{$1}[$_[0]]=timeToInt($2) if
        $line =~ /^(\d+_\S+_\S+_\S+)\s(\d+:\d+:\d+)/;
    };
    close $fh;
};
sub timeToInt {
    my ($hour,$mins,$secs)=split(":",$_[0]);
    return $hour*3600+$mins*60+$secs;
};
sub intToTime {
    my $sign=' ';
    $sign='-' if $_[0] < 0;
    return sprintf "%s%02d:%02d:%02d", $sign, $_[0]/3600,$_[0]/60%60,$_[0]%60;
};
sub getJoined {
    # $_0 = caseName, $_1 = filenr ( 0,1 ) or result (2), $_2 = flag: toNumber
    my $asNumber=$_[2];
    my $default=do{$asNumber ? 9e9 : ' N/A' };
    return map { getJoined($_[0],$_,$asNumber) } (0..2) unless defined $_[1]; 
    my $index  =$_[1];
    my @crtLine=@{$joinHash{$_[0]}};
    return do { defined $crtLine[$index] ? 
                do { $asNumber ?
                         $crtLine[$index] : intToTime($crtLine[$index] ) }
            : $default } if $index lt 2;
    return $default unless defined($crtLine[0]) && defined($crtLine[1]);
    return do { $asNumber ?  $crtLine[0] - $crtLine[1] :
                intToTime($crtLine[0] - $crtLine[1]) };
}
sub sortByOpt {
    my ($x,$y)=@_;
    if ($opt{'c'} || $opt{'C'}) {        # sort by Case name
    $x =~ s/^\d+_//g; $y =~ s/^\d+_//g;
    if ($opt{'C'}) {
        $x=~tr|a-z|A-Z|;
        $y=~tr|a-z|A-Z|;
    };
    ($y,$x)=($x,$y) if $opt{'R'};
    return $x cmp $y;
    } elsif ($opt{'a'}||$opt{'b'}||$opt{'r'}) {   # sort by times
    my $abr=0;                                # default to `a`
    $abr=1 if $opt{'b'};
    $abr=2 if $opt{'r'};
    $x = getJoined($x,$abr,1);
    $y = getJoined($y,$abr,1);
    } else {                # sort numericaly by case number
    $x =~ s/_.*$//g; $y =~ s/_.*$//g;
    };
    ($y,$x)=($x,$y) if $opt{'R'};
    return $x<=>$y;
}

getopts('abCchnRr',\%opt) or mydie 'Unknow option.';
usage if ($opt{'h'});

foreach my $fileKey (0..1) {
    if (defined($ARGV[$fileKey])) {
    mydie 'Arg "'.$ARGV[$fileKey].'" is not a file.' unless
        -f $ARGV[$fileKey];
    $files[$fileKey]=$ARGV[$fileKey];
    };
    readFile $fileKey
};

my @fileNames=map {s/.txt$//;$_} @files;
my $headLine=sprintf $formatString, 'Test Cases', 
    map {' '.$_.'(hh:mm:ss)'}  @fileNames, 'delta ('.join("-",@fileNames).')';
print $headLine.('-' x ( length($headLine) - 1) )."\n";

map {
    printf $formatString, $_, getJoined($_);
} sort { sortByOpt($a,$b) } keys %joinHash;

其中:

Usage: ./mycode.pl [-a|-b|-r|-c|-n] [file1] [file2]
    -a Sort by file A times
    -b Sort by file B times
    -r Sort by result times
    -c Sort alphabeticaly by case name
    -C Sort alphabeticaly by case name (Case insensitive)
    -n Sort numericaly by case num (default)
    -R Reverse sort order
    file1 and file2 are by default: 'a1.txt' and 'a2.txt'.

这样:

./mycode.pl -RC d1.txt d2.txt
> Test Cases                                    d1(hh:mm:ss)        d2(hh:mm:ss)        delta (d1-d2)(hh:mm:ss)
---------------------------------------------------------------------------------------------------------------
> 80_wireframe_view_redraws_with_DATUMS_on_(GR) 00:00:50            00:00:50            00:00:00
> 300_wireframe_view_redraws_(GR)               N/A                 00:01:55            N/A
> 270_shaded_view_redraws_(GR)                  00:01:40            00:01:35            00:00:05
> 2_shaded_mouse_spins_(GR)                     00:00:20            00:00:21           -00:00:59
> 240_realtime_rendered_redraws_(GR)_1          00:13:24            N/A                 N/A
> 6_Last_HLR_activations_(CP)                   00:01:12            00:01:09            00:00:03
> 120_hidden_view_redraws_with_Last_HLR_(GR)    00:00:44            00:00:40            00:00:04
> 3_hidden_view_redraws_(GR)                    00:01:37            00:01:34            00:00:03

Nota:我已将a1.txt复制到d2.txt,将a2.txt复制到d1.txt并修改(使用sed)s/Fast/Last/因为字母表中的第一个上部字符比第一个字母低......