我成功创建了一个数组的哈希,并使用它来计算文件中每个DNA序列的对数得分(Creating a hash of arrays for DNA sequences, Perl具有输入文件格式)。我得到每个序列的分数,但每次计算都会得到警告。自然,我想清除警告。警告是:Use of uninitialized value in string eq at line 148
。
这是代码的摘要版本(如有必要,我可以发布完整的代码):
use strict;
use warnings;
use Data::Dumper;
#USER SPECIFICATIONS
print "Please enter the filename of the fasta sequence data: ";
my $filename1 = <STDIN>;
#Remove newline from file
chomp $filename1;
#Open the file and store each dna seq in hash
my %id2seq = ();
my %HoA = ();
my %loscore = ();
my $id = '';
open (FILE, '<', $filename1) or die "Cannot open $filename1.",$!;
my $dna;
while (<FILE>)
{
if($_ =~ /^>(.+)/)
{
$id = $1; #Stores 'Sequence 1' as the first $id, for example
}
else
{
$HoA{$id} = [ split(//) ]; #Splits the contents to allow for position reference later
$id2seq{$id} .= $_; #Creates a hash with each seq associated to an id number, used for calculating tables that have been omitted for space
$loscore{$id} .= 0; #Creates a hash with each id number to have a log-odds score
}
}
close FILE;
#User specifies motif width
print "Please enter the motif width:\n";
my $width = <STDIN>;
#Remove newline from file
chomp $width;
#Default width is 3 (arbitrary number chosen)
if ($width eq '')
{
$width = 3;
}
#Omitting code about $width<=0, creation of log-odds score hash to save space
foreach $id (keys %HoA, %loscore)
{
for my $pos (0..($width-1))
{
for my $base (qw( A C G T))
{
if ($HoA{$id}[$pos] eq $base) #ERROR OCCURS HERE
{
$loscore{$id} += $logodds{$base}[$pos];
}
elsif ( ! defined $HoA{$id}[$pos])
{
print "$pos\n";
}
}
}
}
print Dumper(\%loscore);
我得到的输出是:
Use of uninitialized value in string eq at line 148, <STDIN> line 2.
2
(This error repeats 4 times for each position - most likely due to matching to each $base?)
$VAR1 = {
'Sequence 15' => '-1.27764697876093',
'Sequence 4' => '0.437512962981119',
(continues for 29 sequences)
}
总而言之,我想计算每个序列的对数得分。我有一个对数奇数分数哈希%loscore
,其中包含主题内每个位置的碱基分数。对数得分是通过将参考值相加得出的。例如,如果log-odds表是
A 4 3 2
C 7 2 1
G 6 9 2
T 1 0 3
序列CAG
的对数奇数得分为7+3+2=12
。
目前,我认为该错误是由于我将DNA字符串拆分为数组散列的方式而发生的。如前所述,如果您需要所有代码以便可以复制粘贴,则可以提供它。我认为解决方案非常简单,我只需要有人指出正确的方向即可。感谢您提供所有帮助,如有疑问,我可以澄清。另外,任何可以帮助我发布更简洁的问题的技巧都值得赞赏(我知道这很长,我只想提供足够的背景信息)。
答案 0 :(得分:0)
这是我用来遍历%HoA的代码。它计算每个序列的对数奇数分数,然后遍历每个序列以找到每个序列的最大分数。非常感谢大家的帮助!
foreach $id (keys %HoA)
{
for my $pos1 (0..length($HoA{$id})-1)
{
for my $pos2 ($pos1..$pos1+($width-1))
{
for my $base (qw( A C G T))
{
if ($HoA{$id}[$pos2] eq $base)
{
for my $pos3 (0..$width-1)
{
$loscore{$id} += $logodds{$base}[$pos3];
if ($loscore{$id} > $maxscore{$id})
{
$maxscore{$id} = $loscore{$id};
}
}
}
elsif ( ! defined $HoA{$id}[$pos2])
{
print "$pos2\n";
}
}
}
}
}