首先,如果我的格式不正确,我很抱歉,我是编写脚本的新手(3天),这是我在这个网站上的第一篇文章。
我有两个标签分隔的文件,File a
包含14列,File b
包含8列。
File b
中的一列有一个数值,该值与File a
中两个数字字段生成的数字范围相关。
对于File a
中的每一行,我需要搜索File b
并打印两个文件中字段的数据组合。由于数字范围被接受,File a
的每一行都会有多个匹配。
我创建的代码完全符合我的要求,但只针对File a
的第一行,并且不会继续循环。我已经浏览了整个互联网,我相信它可能与两个文件从标准输入读取的事实有关。我试图纠正这个问题,但我似乎无法得到任何工作
我目前的理解是,通过将一个文件更改为从不同的文件描述符中读取,我的循环可以工作......使用>$3
这样的东西,但是尽管我进行了研究,但我并没有真正理解这一点。或者可能使用我正在努力的grep
函数。
以下是我现在使用的代码的大纲:
use strict;
use warnings;
print "which file read from?\n";
my $filea = <STDIN>;
chomp $filea;
{
unless (open ( FILEA, $filea) {
print "cannot open, do you want to try again? y/n?\n?";
my $attempt = <STDIN>;
chomp $again;
if ($again =~ 'n') {
exit;
} else {
print "\n";
$filea = <STDIN>;
chomp $filea;
redo;
}
}
}
#I also open fileb the same way, but wont write it all out to save space and your time.
my output = 'output.txt';
open (OUTPUT, ">>$output");
while (my $loop1 = <FILEA>) {
chomp $loop1;
( my $var1, my $var2, my $var3, my $var4, my $var5, my $var6,
my $var7, my $var8, my $var9, my $var10, my $var11, my $var12,
my $var13, my $var14 ) = split ( "\t", $loop1);
#create the range of number which needs to be matched from file b.
my $length = length ($var4);
my $range = ($var2 + $length);
#perform the search loop through fileb
while (my $loop2 = <FILEB>) {
chomp $loop2;
( my $vala, my $valb, my $valc, my $vald, my $vale, my $valf,
my $valg) = split ( "\t", $loop2 );
#there are then several functions and additions of the data, which all work basicly so I'll just use a quick example.
if ($vald >= $val3 $$ $vald <= $range) {
print OUTPUT "$val1, $vald, $val11, $valf, $vala, $val5 \n";
}
}
}
我希望这一切都有意义,我试图让一切尽可能清楚,如果有人可以帮我编辑代码,那么循环将继续通过所有文件,这将是伟大的。
如果可能,请说明你做了什么。理想情况下,如果可以在不改变代码的情况下获得此结果,我会喜欢它。
谢谢你们!
答案 0 :(得分:2)
尽可能避免裸露的手柄;使用$ fh(filehandle)代替FH
您可以使用until而不是except,并跳过重做:
print "Enter the file name\n";
my $file_a = <STDIN>;
chomp $file_a;
my $fh_a;
until(open $fh_a, '<', $file_a) {
print "Re-enter the file name or 'n' to cancel\n";
$file_a = <STDIN>;
chomp $file_a;
if($file_a eq 'n') {
exit;
}
}
您可以(应该)使用数组而不是所有这些单独的列变量:my @cols_a = split /\t/, $line;
您应该将文件B读入数组一次,然后在每次需要时搜索该数组:my @file_b = <$fh_b>;
结果将如下所示:
#Assume we have opened both files already . . .
my @file_b = <$fh_b>;
chomp @file_b;
while(my $line = <$fh_a>) {
chomp $line;
my @cols_a = split /\t/, $line;
#Remember, most arrays (perl included) are zero-indexed,
#so $cols_a[1] is actually the SECOND column.
my $range = ($cols_a[1] + length $cols_a[3]);
foreach my $line_b (@file_b) {
#This loop will run once for every single line of file A.
#Not efficient, but it will work.
#There are, of course, lots of optimisations you can make
#(starting with, for example, storing file B as an array of array
#references so you don't have to split each line every time)
my @cols_b = split /\t/, $line_b;
if($cols_b[3] > $cols_a[2] && $cols_b[3] < ($cols_a[2] + $range)) {
#Do whatever here
}
}
}