我正在尝试将FASTA文件放入哈希值,以便稍后可以操作它,ID为键,序列为值。但我的输出只打印最后一个ID并将所有序列连接在一起。
r = 0;
t1 = clock ();
for (v = 0; v < 2000000 - 1; v++) r += isprime2 (v);
t2 = clock ();
printf (" isprime2 (%lf sec) - %u primes\n", (t2-t1)/CLOCKS_PER_SEC, r);
r = 0;
t1 = clock ();
for (v = 0; v < 2000000 - 1; v++) r += isprime3 (v);
t2 = clock ();
printf (" isprime3 (%lf sec) - %u primes\n", (t2-t1)/CLOCKS_PER_SEC, r);
>cel-mir-35 MI0000006 Caenorhabditis elegans miR-35 stem-loop
UCUCGGAUCAGAUCGAGCCAUUGCUGGUUUCUUCCACAGUGGUACUUUCCAUUAGAACUA
UCACCGGGUGGAAACUAGCAGUGGCUCGAUCUUUUCC
>cel-mir-36 MI0000007 Caenorhabditis elegans miR-36 stem-loop
CACCGCUGUCGGGGAACCGCGCCAAUUUUCGCUUCAGUGCUAGACCAUCCAAAGUGUCUA
UCACCGGGUGAAAAUUCGCAUGGGUCCCCGACGCGGA
>cel-mir-37 MI0000008 Caenorhabditis elegans miR-37 stem-loop
UUCUAGAAACCCUUGGACCAGUGUGGGUGUCCGUUGCGGUGCUACAUUCUCUAAUCUGUA
UCACCGGGUGAACACUUGCAGUGGUCCUCGUGGUUUCU
>cel-mir-38 MI0000009 Caenorhabditis elegans miR-38 stem-loop
GUGAGCCAGGUCCUGUUCCGGUUUUUUCCGUGGUGAUAACGCAUCCAAAAGUCUCUAUCA
CCGGGAGAAAAACUGGAGUAGGACCUGUGACUCAU
我想将每个ID和相应的序列作为输出
cel-mir-38 MI0000009 Caenorhabditis elegans miR-38 stem-loop UCUCGGAUCAGAUCGAGCCAUUGCUGGUUUCUUCCACAGUGGUACUUUCCAUUAGAACUAUCACCGGGUGGAAACUAGGGCUCGAUCUUUUCCCACCGCUGUCGGGGAACCGCGCCAAUUUUCGCUUCAGUGCUAGACCAUCCAAAGUGUCUAUCACCGGGUGAAAAUUCGCAUGGGUCCCCGACGCGGAUUCUAGAAACCCUUGGACCAGUGUGGGUGUCCGUUGCGGUGCUACAUUCUCUAAUCUGUAUCACCGGGUGAACACUUGCAGUGGUCCUCGUGGUUUCUGUGAGCCAGGUCCUGUUCCGGUUUUUUCCGUGGUGAUAACGCAUCCAAAAGUCUCUAUCACCGGGAGAAAAACUGGAGUAGGACCUGUGACUCAU
cel-mir-38 MI0000009 Caenorhabditis elegans miR-38 stem-loop UCUCGGAUCAGAUCGAGCCAUUGCUGGUUUCUUCCACAGUGGUACUUUCCAUUAGAACUAUCACCGGGUGGAAACUAGGGCUCGAUCUUUUCCCACCGCUGUCGGGGAACCGCGCCAAUUUUCGCUUCAGUGCUAGACCAUCCAAAGUGUCUAUCACCGGGUGAAAAUUCGCAUGGGUCCCCGACGCGGAUUCUAGAAACCCUUGGACCAGUGUGGGUGUCCGUUGCGGUGCUACAUUCUCUAAUCUGUAUCACCGGGUGAACACUUGCAGUGGUCCUCGUGGUUUCUGUGAGCCAGGUCCUGUUCCGGUUUUUUCCGUGGUGAUAACGCAUCCAAAAGUCUCUAUCACCGGGAGAAAAACUGGAGUAGGACCUGUGACUCAU
cel-mir-38 MI0000009 Caenorhabditis elegans miR-38 stem-loop UCUCGGAUCAGAUCGAGCCAUUGCUGGUUUCUUCCACAGUGGUACUUUCCAUUAGAACUAUCACCGGGUGGAAACUAGGGCUCGAUCUUUUCCCACCGCUGUCGGGGAACCGCGCCAAUUUUCGCUUCAGUGCUAGACCAUCCAAAGUGUCUAUCACCGGGUGAAAAUUCGCAUGGGUCCCCGACGCGGAUUCUAGAAACCCUUGGACCAGUGUGGGUGUCCGUUGCGGUGCUACAUUCUCUAAUCUGUAUCACCGGGUGAACACUUGCAGUGGUCCUCGUGGUUUCUGUGAGCCAGGUCCUGUUCCGGUUUUUUCCGUGGUGAUAACGCAUCCAAAAGUCUCUAUCACCGGGAGAAAAACUGGAGUAGGACCUGUGACUCAU
cel-mir-38 MI0000009 Caenorhabditis elegans miR-38 stem-loop UCUCGGAUCAGAUCGAGCCAUUGCUGGUUUCUUCCACAGUGGUACUUUCCAUUAGAACUAUCACCGGGUGGAAACUAGGGCUCGAUCUUUUCCCACCGCUGUCGGGGAACCGCGCCAAUUUUCGCUUCAGUGCUAGACCAUCCAAAGUGUCUAUCACCGGGUGAAAAUUCGCAUGGGUCCCCGACGCGGAUUCUAGAAACCCUUGGACCAGUGUGGGUGUCCGUUGCGGUGCUACAUUCUCUAAUCUGUAUCACCGGGUGAACACUUGCAGUGGUCCUCGUGGUUUCUGUGAGCCAGGUCCUGUUCCGGUUUUUUCCGUGGUGAUAACGCAUCCAAAAGUCUCUAUCACCGGGAGAAAAACUGGAGUAGGACCUGUGACUCAU
我应该改变哪一部分?
另外,如何将序列作为键和id作为值?
答案 0 :(得分:1)
您没有正确地累积哈希值,并且您也没有打印它。
while (<FILE>) {
chomp;
if($_ =~ /^>(.+)/){
$id = $1;
} elsif (/^[A-Z]+$/) {
$seq .= $_;
} else {
$fastahash{$id} = $seq; # Populate the hash.
}
}
for my $id (keys %fastahash) {
print "$id $fastahash{$id}\n"; # Print it.
}
答案 1 :(得分:0)
我认为,当您应该分配$_
时,您需要将$seq
分配给fastahash。此外,你永远不会重置id或seq,所以有一个潜在的错误。尝试这样的事情:
while (<FILE>) {
chomp;
if (/^>(.+)/) {
$id = $1;
} elsif (/^[A-Z]+$/) {
$seq .= $_;
} else {
$fastahash{$id} = $seq if $id;
$id = undef;
$seq = '';
}
}
$fastahash{$id} = $seq if $id;
答案 2 :(得分:0)
我意识到这不是代码审核,但我认为对您的代码做一些评论会很有用
在声明变量时,通常不需要定义变量。实际上,如果将标量变量设置为空字符串
最佳做法是使用词法文件句柄和open
的三参数形式。所以
open FILE, "file.fasta", or die $!;
最好写成
open my $fh, '<', 'file.fasta' or die $!;
(请注意,您的原始代码中也有一个多余的逗号。)
词法文件句柄通常会删除它们close
的必要性,因为它们在超出范围时会被销毁时隐式关闭
您可能不熟悉Perl的默认变量$_
,但如果使用它,代码可以更清晰,更简洁
您已将其与chomp
一起使用,相当于chomp $_
,而$_ =~ /^>(.+)/
只需/^>(.+)/
请注意,foreach
完全等同于for
,大多数熟悉Perl的程序员都会更喜欢前者
我会写你的程序
use strict;
use warnings;
open my $fh, '<', 'file.fasta' or die $!;
my %fasta_hash;
my ($id, $seq);
while ( <$fh> ) {
chomp;
if ( /^>(.+)/ ) {
$id = $1;
}
elsif ( /\S/ and not /[^ACGTU]/ ) {
$seq .= $_;
}
else {
$fasta_hash{$id} = $seq;
}
}
for my $id ( keys %fasta_hash ) {
print "$id -- $fasta_hash{$id}\n";
}
cel-mir-35 MI0000006 Caenorhabditis elegans miR-35 stem-loop -- UCUCGGAUCAGAUCGAGCCAUUGCUGGUUUCUUCCACAGUGGUACUUUCCAUUAGAACUAUCACCGGGUGGAAACUAGCAGUGGCUCGAUCUUUUCC
cel-mir-37 MI0000008 Caenorhabditis elegans miR-37 stem-loop -- UCUCGGAUCAGAUCGAGCCAUUGCUGGUUUCUUCCACAGUGGUACUUUCCAUUAGAACUAUCACCGGGUGGAAACUAGCAGUGGCUCGAUCUUUUCCCACCGCUGUCGGGGAACCGCGCCAAUUUUCGCUUCAGUGCUAGACCAUCCAAAGUGUCUAUCACCGGGUGAAAAUUCGCAUGGGUCCCCGACGCGGAUUCUAGAAACCCUUGGACCAGUGUGGGUGUCCGUUGCGGUGCUACAUUCUCUAAUCUGUAUCACCGGGUGAACACUUGCAGUGGUCCUCGUGGUUUCU
cel-mir-36 MI0000007 Caenorhabditis elegans miR-36 stem-loop -- UCUCGGAUCAGAUCGAGCCAUUGCUGGUUUCUUCCACAGUGGUACUUUCCAUUAGAACUAUCACCGGGUGGAAACUAGCAGUGGCUCGAUCUUUUCCCACCGCUGUCGGGGAACCGCGCCAAUUUUCGCUUCAGUGCUAGACCAUCCAAAGUGUCUAUCACCGGGUGAAAAUUCGCAUGGGUCCCCGACGCGGA
至于如何反转哈希以便将序列用作键,在我上面的版本中,您只需将行$fasta_hash{$id} = $seq;
更改为$fasta_hash{$seq} = $id;