我的目标是打开一个包含固定长度的单个列的文件(我的Mac上有1个字符= 2个字节),然后将文件的行读入数组,从指定的点开始和结束。该文件很长,所以我使用seek
命令跳转到文件的相应起始行。该文件是染色体序列,排列为单列。我成功跳转到文件中的适当位置,但我无法将序列读入数组。
my @seq = (); # to contain the stretch of sequence I am seeking to retrieve from file.
my $from_bytes = 2*$from - 2; # specifies the "start point" in terms of bytes.
seek( SEQUENCE, $from_bytes, 0 );
my $from_base = <SEQUENCE>;
push ( @seq, $from_base ); # script is going to the correct line and retrieving correct base.
my $count = $from + 1; # here I am trying to continue the read into @seq
while ( <SEQUENCE> ) {
if ( $count = $to ) { # $to specifies the line at which to stop
last;
}
else {
push( @seq, $_ );
$count++;
next;
}
}
print "seq is: @seq\n\n"; # script prints only the first base
答案 0 :(得分:1)
似乎你正在读取固定宽度记录,由$到行组成,每行有2个字节(1个字符+ 1个换行符)。因此,您可以使用单个read简单地读取每个染色体序列。一个简短的例子:
use strict;
use warnings;
use autodie;
my $record_number = $ARGV[0];
my $lines_per_record = 4; # change to the correct value
my $record_length = $lines_per_record * 2;
my $offset = $record_length * $record_number;
my $fasta_test = "fasta_test.txt";
if (open my $SEQUENCE, '<', $fasta_test) {
my $sequence_string;
seek $SEQUENCE, $offset, 0;
my $chars_read = read($SEQUENCE, $sequence_string, $record_length);
if ($chars_read) {
my @seq = split /\n/, $sequence_string; # if you want it as an array
$sequence_string =~ s/\n//g; # if you want the chromosome sequence as a single string without newlines
print $sequence_string, "\n";
} else {
print STDERR "Failed to read record $record_number!\n";
}
close $SEQUENCE;
}
有了更多信息,人们可能会提供更好的解决方案。