我有两个文件(文件A和文件B),格式如下。我想匹配两个文件中的某些数据模式并进行匹配。我的编码使用了很长时间来生成结果。除此之外,在某处导致不完全提取是错误的。任何替代方法或改进?
我从两个文件中提取每个行名称和分数,并将它们存储在两个输出文件中。每个输出文件都包含提取的名称和分数。首先,如果文件A中的得分为负值,请忽略特定的行提取。否则,如果文件A中的得分为正值,则将文件A的名称与文件B匹配。将生成三个条件和三个结果报告(pass.rpt,fail.rpt和noCheck.rpt)。
对于那些匹配的名称,它将继续进行比较。如果文件A得分> 50和文件B得分> 40,打印匹配的名称,从文件A(得分A)得分和从文件B得分(得分B)到pass.rpt和pass_counter($ pc)加1每次比较。否则,如果< 50且< 40,则打印匹配的名称,得分A和得分B为fail.rpt和fail_counter($ fc)加1。
最后一个条件是来自文件A的那些负值。如果两个文件中的名称匹配,则打印名称,得分A和得分B到noCheck.rpt和noCheck_counter加一。
档案A
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
报告:学生A
-science
-math
-language。
日期:星期五7月19日17:00:31 2013
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jane_let [0](sa)58.78 r 66.15 0.00 -33
Alfert_pipe(sa)74.72 r 92.72 0.00 82
Olive_pipe [8](sa)64.28 f 25.40 0.00 58
mass / excel / i60 86.21 r 59.90 0.00 68
Anne_let(sa)51.98 f 12.69 0.00 -39
yuki / 099 / pipe 76.52 r 94.32 0.00 -82
frey / let / sa / y589 47.79 f 99.00 0.00 78
alan / excel / sa / y589 97.00 f 96.00 0.00 -70
..
..
档案B
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
报告:学生B
-science
-math
-language。
日期:星期五7月19日17:00:31 2013
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Ash_let [9](sa)58.78 r 66.15 0.00 33
Alfert_pipe(sa)74.72 r 92.72 0.00 57
Olive_pipe [8](sa)64.28 f 25.40 0.00 20
mass / excel / i60 86.21 r 59.90 0.00 16
Sam_let(sa)51.98 f 12.69 0.00 -39
yuki / 099 / pipe 76.52 r 94.32 0.00 82
frey / let / sa / y589 47.79 f 99.00 0.00 30
alan / excel / sa / y589 67.00 f 96.00 0.00 -90
..
..
编码:
use Getopt::Long qw(:config no_ignore_case);
use Data::Dumper;
use POSIX qw(floor);
use strict;
use warning;
my $orig = '';
my $new = '';
GetOptions('orig=s' => \$orig, 'new=s' => \$new);
if (!$orig|!$new) {
print "\n\t Help: test.pl -orig <file A> -new <file B>\n";
exit;
}
open (PASS, ">pass.rpt") || die "ERROR: cannot open";
open (FAIL, ">fail.rpt") || die "ERROR: cannot open";
open (NC, ">noCheck.rpt") || die "ERROR: cannot open";
open (t1, ">t1") || die "ERROR: cannot open";
open (t2, ">t2") || die "ERROR: cannot open";
my (@array, $line, $end1, $slack1, $b1, $THIS, @arr1, @arr2, @tmp1, @tmp2, @emp, @emp2, $data1, $data2,$emp1,$emp2,$emp3,$emp4,$ep1,$s1,$ep2,$s2,$ncc,$pc,$fc);
$ncc = 0;
$pc = 0;
$fc = 0;
fileA_ext();
fileB_ext();
check();
#_______________________________________________________________________________________________
sub fileA_ext() {
if ($orig =~ /\S+\.gz$/) {
open (FileA,"gunzip -c $orig |") || die "ERROR: can't read $orig\n";
} else {
open (FileA,"$orig") || die "ERROR: can't read $orig\n";
}
while (@array = <FileA>) {
foreach $line(@array) {
if ($line =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/) {
if ($line !~ m/\((sa)\)/) {
@arr1 = @emp;
next if ($line =~ m/Name/);
$name1 = "$1";
$score1 = "$12";
my $data1 = join(";",$name1,$score1);
push (@arr1, $data1);
}
if ($line =~ m/\((sa)\)/) {
@arr1 = @emp2;
@tmp1 = @emp;
next if ($line =~ m/Name/);
push (@tmp1, $line);
#print t "@tmp1\n";
foreach $line (@tmp1) {
if ($line =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/) {
my $name2 = "$1";
substr($name2, -13) = '';
my $score2 = "$12";
my $data1 = join(";",$name2,$score2);
push (@arr1, $data1);
$name2 = $score2 ="";
#print "@arr1\n\n";
}
}
}
print t1 "@arr1\n\n";
}
}
}
close (FileA);
}
#____________________________________________________________________________________________
sub FileB_ext() {
if ($new =~ /\S+\.gz$/) {
open (FileB,"gunzip -c $new |") || die "ERROR: $THIS can't read $new\n";
} else {
open (FileB,"$new") || die "ERROR: $THIS can't read $new\n";
}
while (@array = <FileB>) {
foreach $line(@array) {
if ($line =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/) {
#print "$line\n";
if ($line !~ m/\((sa)\)/) {
@arr2 = @emp;
next if ($line =~ m/Name/);
my $name3 = "$1";
my $score3 = "$12";
my $data2 = join(";",$name3,$score3);
push (@arr2, $data2);
}
if ($line =~ m/\((sa)\)/) {
@arr2 = @emp2;
@tmp2 = @emp;
next if ($line =~ m/Name/);
push (@tmp2, $line);
#print t "@tmp2\n";
foreach $line (@tmp2) {
if ($line =~ m/(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)\s+(.*)/) {
my $name4 = "$1";
substr($name4, -13) = '';
my $score4 = "$12";
my $data2 = join(";",$name4,$score4);
push (@arr2, $data2);
$name4 = $score4 ="";
#print "@arr2\n\n";
}
}
}
print t2 "@arr2\n\n";
}
}
}
close (FileB);
}
sub check() {
foreach $data1 (@arr1) {
if ($data1 ne ""){
if ($data1 =~ m/(.*)\;(.*)/) {
$ep1 = $emp1;
$s1 = $emp2;
my $ep1 = "$1";
my $s1 = "$2";
#print r "$ep1 $s1\n\n";
foreach $data2 (@arr2) {
if ($data2 ne "") {
if ($data2 =~ m/(.*)\;(.*)/) {
$ep2 = $emp3;
$s2 = $emp4;
my $ep2 = "$1";
my $s2 = "$2";
#print R "$ep2 $s2\n";
if ( $ep1 eq $ep2 && $s1 =~ m/-/g) {
$ncc++;
#print NC "Total match: $ncc\n\n";
print NC "$ep1 $s1 $s2\n";
}
if ( $ep1 eq $ep2 && $s1 !~ m/-/g && $s1 > 50 && $s2 > 40) {
$pc++;
print PASS "$ep1 $s1 $s2\n";
}
if ( $ep1 eq $ep2 && $s1 !~ m/-/g && $s1 < 50 && $s2 < 40) {
$fc++;
print FAIL "$ep1 $s1 $s2\n";
}
}
}
}
}
}
}
print NC "\nTotal match: $ncc\n\n";
print PASS "\nTotal match: $pc\n\n";
print FAIL "\nTotal match: $fc\n\n";
}
预期结果:
名称得分A得分B
Alfert_pipe(sa)82 57
Olive_pipe [8](sa)58 20
mass / excel / i60 68 16
frey / let / sa / y589 78 30
yuki / 099 / pipe -82 82
alan / excel / sa / y589 -70 -90
答案 0 :(得分:0)
<强>第一即可。这段代码从未运行过!至少,不是
use strict;
use warning; #should be warnings
所以,这句话
我的编码使用了很长时间来生成结果。
是一个简单的谎言。您在希望发布代码之前添加了strict
和warnings
行:有人会调试您的代码。
请参阅,如果您需要帮助 - 首先 - 尝试自己的帮助,并尝试使用两个基本pargma use strict; use warnings;
你会得到许多错误的建议。
<强>第二强>:
下一个可能不会做你真正想要的......
foreach $line (@array) { #line 104
...
foreach $line (@tmp1) { #line 129
...
}
}