我有一个perl脚本但它只在给出序列时计算分子量。但是我想计算fasta文件中蛋白质序列的分子量。
print "Enter the amino acid sequence:\n";
$a = < STDIN > ;
chomp($a);
my @a = ();
my $a = '';
$x = length($a);
print "Length of sequence is : $x";
@a = split('', $a);
$b = 0;
my %data = (
A=>71.09, R=>16.19, D=>114.11, N=>115.09,
C=>103.15, E=>129.12, Q=>128.14, G=>57.05,
H=>137.14, I=>113.16, L=>113.16, K=>128.17,
M=>131.19, F=>147.18, P=>97.12, S=>87.08,
T=>101.11, W=>186.12, Y=>163.18, V=>99.14
);
foreach $i(@a) {
$b += $data{$i};
}
$c = $b - (18 * ($x - 1));
print "\nThe molecular weight of the sequence is $c";
答案 0 :(得分:1)
首先,你必须告诉我们.fasta文件的格式。据我所知他们看起来像
>seq_ID_1 descriptions etc
ASDGDSAHSAHASDFRHGSDHSDGEWTSHSDHDSHFSDGSGASGADGHHAH
ASDSADGDASHDASHSAREWAWGDASHASGASGASGSDGASDGDSAHSHAS
SFASGDASGDSSDFDSFSDFSD
>seq_ID_2 descriptions etc
ASDGDSAHSAHASDFRHGSDHSDGEWTSHSDHDSHFSDGSGASGADGHHAH
ASDSADGDASHDASHSAREWAWGDASHASGASGASG
如果我们建议您的代码正常工作,并且计算分子量,我们只需要读取fasta文件,解析它们并按照您的代码计算权重。这听起来更容易。
#!/usr/bin/perl
use strict;
use warnings;
use Encode;
for my $file (@ARGV) {
open my $fh, '<:encoding(UTF-8)', $file;
my $input = join q{}, <$fh>;
close $fh;
while ( $input =~ /^(>.*?)$([^>]*)/smxg ) {
my $name = $1;
my $seq = $2;
$seq =~ s/\n//smxg;
my $mass = calc_mass($seq);
print "$name has mass $mass\n";
}
}
sub calc_mass {
my $a = shift;
my @a = ();
my $x = length $a;
@a = split q{}, $a;
my $b = 0;
my %data = (
A=>71.09, R=>16.19, D=>114.11, N=>115.09,
C=>103.15, E=>129.12, Q=>128.14, G=>57.05,
H=>137.14, I=>113.16, L=>113.16, K=>128.17,
M=>131.19, F=>147.18, P=>97.12, S=>87.08,
T=>101.11, W=>186.12, Y=>163.18, V=>99.14
);
for my $i( @a ) {
$b += $data{$i};
}
my $c = $b - (18 * ($x - 1));
return $c;
}