我正在编写一些代码来处理数据集,但似乎有一个问题,我已经能够在一段时间后找不到了。我认为(并希望!)解决方案相当简单,如果有人能指出我正确的方向,我将非常感激。
#!/usr/bin/perl
use strict;
use warnings;
my %predictions = ('DAMAGING' => 'Disease',
'TOLERATED' => 'Polymorphism',
'A' => 'Ala',
'C' => 'Cys',
'D' => 'Asp',
'P' => 'Pro',
'V' => 'Val',
'L' => 'Leu',
'I' => 'Ile',
'M' => 'Met',
'F' => 'Phe',
'Y' => 'Tyr',
'W' => 'Trp',
'H' => 'His',
'K' => 'Lys',
'R' => 'Arg',
'Q' => 'Gln',
'N' => 'Asn',
'E' => 'Glu',
'S' => 'Ser',
'T' => 'Thr',
'G' => 'Gly');
while(<>) {
chomp;
{
if (length($_)) {
ProcessData($_);
}
}
}
sub ProcessData {
my ($line) = @_;
my @fields = split(/\s+/,$line);
if ($fields[2] =~ /(.)(\d+)(.)/) {
my $native = $1;
my $resnum = $2;
my $mutant = $3;
print "$fields[1] $predictions{$native} $resnum $pedictions{$mutant} \n";
}
}
我尝试使用哈希更改的字段如下所示:
A8726P
,预期输出类似于Ala 8726 Pro
。
提前致谢
答案 0 :(得分:1)
很高兴看到一些示例输入数据,但我认为您的问题是您正在访问哈希%pedictions
而不是%predictions
。您显示的代码会显示错误
Global symbol "%pedictions" requires explicit package
这是一个死的赠品。
如果我创建一个包含单个记录的文件
AA BB A8726P DD EE FF
然后我得到输出
BB Ala 8726 Pro
这似乎是你所期望的。
这个整理你的程序也可能有所帮助。请注意split ' '
(如果您正在分割split
,则仅$_
)优于split /\s+/
,因为如果有任何前导空格,后者将返回空的第一个字段记录。
use strict;
use warnings;
my %predictions = (
DAMAGING => 'Disease',
TOLERATED => 'Polymorphism',
A => 'Ala',
C => 'Cys',
D => 'Asp',
P => 'Pro',
V => 'Val',
L => 'Leu',
I => 'Ile',
M => 'Met',
F => 'Phe',
Y => 'Tyr',
W => 'Trp',
H => 'His',
K => 'Lys',
R => 'Arg',
Q => 'Gln',
N => 'Asn',
E => 'Glu',
S => 'Ser',
T => 'Thr',
G => 'Gly'
);
my ($filename) = @ARGV;
open my $fh, '<', $filename or die qq{Unable to open "$filename" for input: $!};
while (<$fh>) {
ProcessData($_) if /\S/;
}
sub ProcessData {
my ($line) = @_;
my @fields = split ' ', $line;
if ($fields[2] =~ /\A(.)(\d+)(.)\z/) {
my $native = $1;
my $resnum = $2;
my $mutant = $3;
print "$fields[1] $predictions{$native} $resnum $predictions{$mutant} \n";
}
}
<强>输出强>
P45381 Arg 168 Cys
Q06187 Lys 430 Glu
P15529 Ser 240 Pro
P00966 Pro 96 Ser
P15289 Asp 255 His
P10275 Gly 820 Ala
P10275 Asp 864 Gly
O75828 Val 93 Ile
P04075 Cys 339 Tyr
O60885 Ala 371 Gly
P03950 Lys 84 Glu
P35670 Val 1146 Met
P11597 Ala 390 Pro
Q9UM73 Arg 1275 Leu
Q99856 Lys 320 Glu
P12821 Thr 1187 Met
P10275 Gly 708 Ala
P15529 Cys 35 Tyr
P05156 His 183 Arg
Q06187 Ile 370 Met
P15056 Glu 586 Lys
P15289 Pro 231 Thr
P68133 Gly 270 Cys
Q9BZ11 Ala 365 Ser
P15289 Ile 179 Ser
P35520 Ile 435 Thr
Q9BWV1 Val 713 Met
P68133 Pro 334 Ser
P21549 Gly 190 Arg
P49748 Gln 159 Arg
P05067 Ile 716 Val
P06732 Gly 243 Ala
P42773 Ala 72 Pro
P49748 Lys 247 Glu
O15382 Thr 186 Arg
P45954 Glu 376 Gly
Q8WVQ1 Leu 224 Pro
P02768 Glu 382 Lys
P06276 Ala 229 Thr
Q8WXF7 Tyr 196 Cys
P37023 His 314 Tyr
Q16790 Gln 326 Arg
P07451 Val 31 Ile
P06727 Asn 147 Ser
P00966 Asp 296 Gly
P00813 Ala 215 Thr
P42771 Pro 114 Leu
P30566 Pro 100 Ala
P21549 Leu 153 Val
Q9H8M2 Ala 170 Thr
O75828 Val 244 Met
P42771 Gln 50 Arg