Perl:如何完美匹配两个文件之间的特定数据并进行比较?

时间:2013-07-23 14:24:25

标签: perl design-patterns match

我有两个文件(文件A和文件B),格式如下。我想匹配两个文件中的某些数据模式并进行匹配。我的编码使用了很长时间来生成结果。除此之外,在某处导致不完全提取是错误的。任何替代方法或改进?

我从两个文件中提取每个行名称和分数,并将它们存储在两个输出文件中。每个输出文件都包含提取的名称和分数。首先,如果文件A中的得分为负值,请忽略特定的行提取。否则,如果文件A中的得分为正值,则将文件A的名称与文件B匹配。将生成三个条件和三个结果报告(pass.rpt,fail.rpt和noCheck.rpt)。

对于那些匹配的名称,它将继续进行比较。如果文件A得分> 50和文件B得分> 40,打印匹配的名称,从文件A(得分A)得分和从文件B得分(得分B)到pass.rpt和pass_counter($ pc)加1每次比较。否则,如果< 50且< 40,则打印匹配的名称,得分A和得分B为fail.rpt和fail_counter($ fc)加1。

最后一个条件是来自文件A的那些负值。如果两个文件中的名称匹配,则打印名称,得分A和得分B到noCheck.rpt和noCheck_counter加一。

档案A

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
报告:学生A
        -science
        -math
        -language。
日期:星期五7月19日17:00:31 2013
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

名称科学数学。评分


Jane_let [0](sa)58.78 r 66.15 0.00 -33

Alfert_pipe(sa)74.72 r 92.72 0.00 82

Olive_pipe [8](sa)64.28 f 25.40 0.00 58

mass / excel / i60 86.21 r 59.90 0.00 68

Anne_let(sa)51.98 f 12.69 0.00 -39

yuki / 099 / pipe 76.52 r 94.32 0.00 -82

frey / let / sa / y589 47.79 f 99.00 0.00 78

alan / excel / sa / y589 97.00 f 96.00 0.00 -70

..
..

档案B

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
报告:学生B
        -science
        -math
        -language。
日期:星期五7月19日17:00:31 2013
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

名称科学数学。评分


Ash_let [9](sa)58.78 r 66.15 0.00 33

Alfert_pipe(sa)74.72 r 92.72 0.00 57

Olive_pipe [8](sa)64.28 f 25.40 0.00 20

mass / excel / i60 86.21 r 59.90 0.00 16

Sam_let(sa)51.98 f 12.69 0.00 -39

yuki / 099 / pipe 76.52 r 94.32 0.00 82

frey / let / sa / y589 47.79 f 99.00 0.00 30

alan / excel / sa / y589 67.00 f 96.00 0.00 -90

..
..

编码:

use Getopt::Long qw(:config no_ignore_case);
use Data::Dumper;
use POSIX qw(floor);
use strict;
use warning;

my $orig = '';
my $new = '';

GetOptions('orig=s' => \$orig, 'new=s' => \$new);

if (!$orig|!$new) {
        print "\n\t Help: test.pl -orig <file A> -new <file B>\n";
        exit;
}

open (PASS, ">pass.rpt") || die "ERROR: cannot open";
open (FAIL, ">fail.rpt") || die "ERROR: cannot open";
open (NC, ">noCheck.rpt") || die "ERROR: cannot open";
open (t1, ">t1") || die "ERROR: cannot open";
open (t2, ">t2") || die "ERROR: cannot open";

my (@array, $line, $end1, $slack1, $b1, $THIS, @arr1, @arr2, @tmp1, @tmp2, @emp, @emp2, $data1, $data2,$emp1,$emp2,$emp3,$emp4,$ep1,$s1,$ep2,$s2,$ncc,$pc,$fc);

$ncc = 0;
$pc = 0;
$fc = 0;

fileA_ext();
fileB_ext();
check();

#_______________________________________________________________________________________________ 
sub fileA_ext() {

if ($orig =~ /\S+\.gz$/) {
   open (FileA,"gunzip -c $orig |") || die "ERROR: can't read $orig\n";
} else {
   open (FileA,"$orig") || die "ERROR: can't read $orig\n";
}

while (@array = <FileA>)    {

foreach $line(@array) {

        if ($line =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/) {


        if ($line !~ m/\((sa)\)/) {

            @arr1 = @emp;
            next if ($line =~ m/Name/);
                    $name1 = "$1";
            $score1 = "$12";

            my $data1 = join(";",$name1,$score1);    
            push (@arr1, $data1);

            }

        if ($line =~ m/\((sa)\)/) {

            @arr1 = @emp2;
            @tmp1 = @emp;
            next if ($line =~ m/Name/);
            push (@tmp1, $line);
            #print t "@tmp1\n";

            foreach $line (@tmp1) {

                if ($line =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/) {

                    my $name2 = "$1";
                    substr($name2, -13) = '';
                    my $score2 = "$12";

                    my $data1 = join(";",$name2,$score2);    
                    push (@arr1, $data1);
                    $name2 = $score2 ="";
                    #print "@arr1\n\n";
                                }
                            }
                        }
print t1 "@arr1\n\n";
}        
}
}
close (FileA);
}

#____________________________________________________________________________________________


sub FileB_ext() {

if ($new =~ /\S+\.gz$/) {
   open (FileB,"gunzip -c $new |") || die "ERROR: $THIS can't read $new\n";
} else {
   open (FileB,"$new") || die "ERROR: $THIS can't read $new\n";
}

while (@array = <FileB>)  {

foreach $line(@array) {

     if ($line =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/) {
        #print "$line\n";

        if ($line !~ m/\((sa)\)/) {

            @arr2 = @emp;
            next if ($line =~ m/Name/);
                    my $name3 = "$1";
            my $score3 = "$12";

            my $data2 = join(";",$name3,$score3);    
            push (@arr2, $data2);

            }

        if ($line =~ m/\((sa)\)/) {

            @arr2 = @emp2;
            @tmp2 = @emp;
            next if ($line =~ m/Name/);
            push (@tmp2, $line);
            #print t "@tmp2\n";

            foreach $line (@tmp2) {

                if ($line =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/) {

                    my $name4 = "$1";
                    substr($name4, -13) = '';
                    my $score4 = "$12";

                    my $data2 = join(";",$name4,$score4);    
                    push (@arr2, $data2);
                    $name4 = $score4 ="";
                    #print "@arr2\n\n";
                                }
                            }
                        }
print t2 "@arr2\n\n";
}         
}
}
close (FileB);
}


sub check() {

foreach $data1 (@arr1) {
    if ($data1 ne ""){

        if ($data1 =~ m/(.*)\;(.*)/) {
            $ep1 = $emp1;
            $s1 = $emp2;
            my $ep1 = "$1";
            my $s1 = "$2";
            #print r "$ep1  $s1\n\n";

        foreach $data2 (@arr2) {
            if ($data2 ne "") {

                if ($data2 =~ m/(.*)\;(.*)/) {
                    $ep2 = $emp3;
                    $s2 = $emp4;
                    my $ep2 = "$1";
                    my $s2 = "$2";
                    #print R "$ep2 $s2\n";


                if ( $ep1 eq $ep2 && $s1 =~ m/-/g) {

                    $ncc++;
                    #print NC "Total match: $ncc\n\n";
                    print NC "$ep1  $s1 $s2\n";
                                    }

                if ( $ep1 eq $ep2 && $s1 !~ m/-/g && $s1 > 50 && $s2 > 40) {

                    $pc++;
                    print PASS "$ep1    $s1 $s2\n";
                                        }

                if ( $ep1 eq $ep2 && $s1 !~ m/-/g && $s1 < 50 && $s2 < 40) {

                    $fc++;
                    print FAIL "$ep1    $s1 $s2\n";
                                        }


}
}
}
}
}
}
print NC "\nTotal match: $ncc\n\n";
print PASS "\nTotal match: $pc\n\n";
print FAIL "\nTotal match: $fc\n\n";


}  

预期结果:

pass.rpt

名称得分A得分B
Alfert_pipe(sa)82 57

fail.rpt

Olive_pipe [8](sa)58 20

mass / excel / i60 68 16

frey / let / sa / y589 78 30

noCheck.rpt

yuki / 099 / pipe -82 82

alan / excel / sa / y589 -70 -90

1 个答案:

答案 0 :(得分:0)

<强>第一即可。这段代码从未运行过!至少,不是

use strict;
use warning;  #should be warnings

所以,这句话

  

我的编码使用了很长时间来生成结果。

是一个简单的谎言。您在希望发布代码之前添加了strictwarnings行:有人会调试您的代码。

请参阅,如果您需要帮助 - 首先 - 尝试自己的帮助,并尝试使用两个基本pargma use strict; use warnings;

运行您的代码

你会得到许多错误的建议。

<强>第二

下一个可能不会做你真正想要的......

foreach $line (@array) {    #line 104
    ...
    foreach $line (@tmp1) { #line 129
        ...
    }
}