我想从file.tbl中找到file.txt中列出的信息。文件详细信息和脚本如下。
file.txt的格式为:
#comp120_c2_seq3 918 0.0
# comp120_c2_seq1 918 0.0
#comp21106_c0_seq1 874 0.0
#comp120_c2_seq2 835 0.0
,而file.tbl的格式为:
#comp788_c0_seq1_ CCTAATCATTTAATGTTTTTTT
#comp1107_c0_seq1_ CAAAAAAAAAAAAAAAAAAAAAATTGTCA
#comp1570_c0_seq3_ TTTTTTTTCTTTTAACAAC
#......
我的脚本如下:
#!/usr/bin/perl -w
#This script reads in a list of sequence names from one file and find associated sequence from another file
open(NAME,"<$ARGV[0]")||die;
open(SEQ,"<$ARGV[1]")||die;
$name = "";
$seq = "";
%pair = ();
while(<SEQ>){
s/\cM/\n/g;
s/\r\n/\n/g;
s/\r/\n/g;
@line = split("\t",$_);
$name = $line[0];
$name =~s/\_+/\_/g;
if ($name=~/^(comp\S*)\_(seq)/){
$name = $1;
}
$seq = $line[1];
$pair{$name} = $seq;
}
while (<NAME>){
s/\cM/\n/g;
s/\r\n/\n/g;
s/\r/\n/g;
if (/^(comp\S*)\s+(seq)/){
print ">$1\n$pair{$1}";
}
}
close NAME;
close SEQ;
帮助我。在此先感谢。
答案 0 :(得分:0)
我可以将file.tbl假设为FASTA文件吗?
如果是这样,您可以使用Bio::SeqIO;
来读取文件。
use Bio::SeqIO;
my $in = Bio::SeqIO->new('-file' => "file.tbl",
'-format' => 'fasta');
while (my $seq = $in->next_seq()) {
# save $seq into a hash
}
答案 1 :(得分:0)
/^(comp\S*)\s+(seq)/
循环中的模式while (<NAME>)
与序列名称不匹配,因为\s
部分之前没有空格seq…
,而是下划线{{ 1}};此处的模式应与_
循环中的模式完全相同。