我是Perl的新手,正在大学从事生物信息学项目。我的FILE1包含一个位置列表,格式为:
99269
550
100
126477
1700
和FILE2的格式为:
517 1878 forward
700 2500 forward
2156 3289 forward
99000 100000 forward
22000 23000 backward
我想将FILE1中的每个位置与FILE2中值的每个范围进行比较,如果一个位置属于其中一个范围,那么我想打印位置,范围和方向。
所以我的预期输出是:
99269 99000 100000 forward
550 517 1878 forward
1700 517 1878 forward
目前它将运行没有错误,但它不输出任何信息,所以我不确定我哪里出错了!当我分割最终的'if'规则时它会运行,但只有当该位置与该范围完全相同时才会起作用。
我的代码如下:
#!/usr/bin/perl
use strict;
use warnings;
my $outputfile = "/Users/edwardtickle/Documents/CC22CDS.txt";
open FILE1, "/Users/edwardtickle/Documents/CC22positions.txt"
or die "cannot open > CC22: $!";
open FILE2, "/Users/edwardtickle/Documents/CDSpositions.txt"
or die "cannot open > CDS: $!";
open( OUTPUTFILE, ">$outputfile" ) or die "Could not open output file: $! \n";
while (<FILE1>) {
if (/^(\d+)/) {
my $CC22 = $1;
while (<FILE2>) {
if (/^(\d+)\s+(\d+)\s+(\S+)/) {
my $CDS1 = $1;
my $CDS2 = $2;
my $CDS3 = $3;
if ( $CC22 > $CDS1 && $CC22 < $CDS2 ) {
print OUTPUTFILE "$CC22 $CDS1 $CDS2 $CDS3\n";
}
}
}
}
}
close(FILE1);
close(FILE2);
答案 0 :(得分:2)
因为只与FILE1
的第一行比较后才读取FILE2将后续行与已关闭的文件进行比较
将FILE1中的行存入数组,然后将FILE2中的每一行与每个数组条目进行比较,如下所示
#!/usr/bin/perl
use strict;
use warnings;
my $outputfile = "out.txt";
open FILE1, "file1.txt"
or die "cannot open > CC22: $!";
open FILE2, "file2.txt"
or die "cannot open > CDS: $!";
open( OUTPUTFILE, ">$outputfile" ) or die "Could not open output file: $! \n";
my @file1list = ();
while (<FILE1>) {
if (/^(\d+)/) {
push @file1list, $1;
}
}
while (<FILE2>) {
if (/^(\d+)\s+(\d+)\s+(\S+)/) {
my $CDS1 = $1;
my $CDS2 = $2;
my $CDS3 = $3;
for my $CC22 (@file1list) {
if ( $CC22 > $CDS1 && $CC22 < $CDS2 ) {
print OUTPUTFILE "$CC22 $CDS1 $CDS2 $CDS3\n";
}
}
}
}
(程序也存在风格问题(比如变量的大写字母),但我忽略了这些,这对初学者来说是一个非常好的程序)
答案 1 :(得分:0)
我认为我可以通过使用split而不是regex来简化其中一些,但我认为我的代码实际上更长,更难以阅读!无论如何,请记住,拆分适用于这样的问题:
# User config area
my $positions_file = 'input_positions.txt';
my $ranges_file = 'input_ranges.txt';
my $output_file = 'output_data.txt';
# Reading data
open my $positions_fh, "<", $positions_file;
open my $ranges_fh, "<", $ranges_file;
chomp( my @positions = <$positions_fh> );
# Store the range data in an array containing hash tables
my @range_data;
# to be used like $range_data[0] = {start => $start, end => $end, dir => $dir}
while (<$ranges_fh>) {
chomp;
my ( $start, $end, $dir ) = split; #splits $_ according to whitespace
push @range_data, { start => $start, end => $end, dir => $dir };
#print "start: $start, end: $end, direction: $dir\n";
} #/while
close $positions_fh;
close $ranges_fh;
# Data processing:
open my $output_fh, ">", $output_file;
#It feels like it should be more efficient to process one range at a time for all data points
foreach my $range (@range_data) { #start one range at a time
#each $range = $range_data[#] = { hash table }
foreach my $position (@positions) { #check all positions
if ( ( $range->{start} <= $position ) and ( $position <= $range->{end} ) ) {
my $output_string = "$position " . $range->{start} . " " . $range->{end} . " " . $range->{dir} . "\n";
print $output_fh $output_string;
} #/if
} #/foreach position
} #/foreach range
close $output_fh;
如果在读取范围数据的while循环期间完成数据处理,则此代码可能会运行得更快。
答案 2 :(得分:0)
您的错误是因为您正在嵌入文件处理,因此您的内部循环只会一次浏览文件的内容,然后卡在eof
。
最简单的解决方案就是首先将内部循环文件完全加载到内存中。
以下演示了使用更多Modern Perl技术:
#!/usr/bin/perl
use strict;
use warnings;
use autodie;
my $cc22file = "/Users/edwardtickle/Documents/CC22positions.txt";
my $cdsfile = "/Users/edwardtickle/Documents/CDSpositions.txt";
my $outfile = "/Users/edwardtickle/Documents/CC22CDS.txt";
my @ranges = do {
# open my $fh, '<', $cdsfile; # Using Fake Data instead below
open my $fh, '<', \ "517 1878 forward\n700 2500 forward\n2156 3289 forward\n99000 100000 forward\n22000 23000 backward\n";
map {[split]} <$fh>;
};
# open my $infh, '<', $cc22file; # Using Fake Data instead below
open my $infh, '<', \ "99269\n550\n100\n126477\n1700\n";
# open my $outfh, '>', $outfile; # Using STDOUT instead below
my $outfh = \*STDOUT;
CC22:
while (my $cc22 = <$infh>) {
chomp $cc22;
for my $cds (@ranges) {
if ($cc22 > $cds->[0] && $cc22 < $cds->[1]) {
print $outfh "$cc22 @$cds\n";
next CC22;
}
}
# warn "$cc22 No match found\n";
}
输出:
99269 99000 100000 forward
550 517 1878 forward
1700 517 1878 forward