所以我试图将DNA的互补链转化为它各自的氨基酸。到目前为止,我有这段代码:
#!/usr/bin/perl
open (INFILE, "sumaira2.out");
open (OUTFILE3, ">>sumaira3.out");
%aacode = (
TTT => "F", TTC => "F", TTA => "L", TTG => "L",
TCT => "S", TCC => "S", TCA => "S", TCG => "S",
TAT => "Y", TAC => "Y", TAA => "STOP", TAG => "STOP",
TGT => "C", TGC => "C", TGA => "STOP", TGG => "W",
CTT => "L", CTC => "L", CTA => "L", CTG => "L",
CCT => "P", CCC => "P", CCA => "P", CCG => "P",
CAT => "H", CAC => "H", CAA => "Q", CAG => "Q",
CGT => "R", CGC => "R", CGA => "R", CGG => "R",
ATT => "I", ATC => "I", ATA => "I", ATG => "M",
ACT => "T", ACC => "T", ACA => "T", ACG => "T",
AAT => "N", AAC => "N", AAA => "K", AAG => "K",
AGT => "S", AGC => "S", AGA => "R", AGG => "R",
GTT => "V", GTC => "V", GTA => "V", GTG => "V",
GCT => "A", GCC => "A", GCA => "A", GCG => "A",
GAT => "D", GAC => "D", GAA => "E", GAG => "E",
GGT => "G", GGC => "G", GGA => "G", GGG => "G",
); # this is the hash table for the amino acids
while ($line=<INFILE>){
$codon = $codon.$line;
@array = split "",$codon;
} # splits all the characters in the text
for ($count = 0; $count<scalar@array; $count= $count + 3) {
$codon = $codon.$array[$count].$array[$count+1].$array[$count+2];
$aminoacid = $aacode{$codon};
} # tells how to read the codon and execute the hash table
$protein = $protein.$aminoacid; #catenate the string
print OUTFILE3 $protein;
我的infile已经有反向互补的DNA,我只想翻译它。出于某种原因,我的输出中没有任何内容。我不知道出了什么问题,因为Terminal也没有给我任何错误。任何帮助都将受到高度赞赏。
以下是我要翻译的文件示例:
TCGTCGCCTCCCCAACCTAGGTAGTCCGTTGCTGCCCGACGACGGCCGGTAGTCGCCT GCGTCCCTCCTGAAAGGCGTTGGCCGGCAAGCTACGCCGTGGCTACCGGAAGCGCGTCCCCATCAC GCGGTCCTAACTGAACGCGACGGGATGGAGAGTGATCACTCCCCGCCGTCGCGTAGTTCGCCACTC
它继续增加17行。
答案 0 :(得分:1)
也许以下内容会有所帮助:
use strict;
use warnings;
my %aacode = (
TTT => "F", TTC => "F", TTA => "L", TTG => "L",
TCT => "S", TCC => "S", TCA => "S", TCG => "S",
TAT => "Y", TAC => "Y", TAA => "STOP", TAG => "STOP",
TGT => "C", TGC => "C", TGA => "STOP", TGG => "W",
CTT => "L", CTC => "L", CTA => "L", CTG => "L",
CCT => "P", CCC => "P", CCA => "P", CCG => "P",
CAT => "H", CAC => "H", CAA => "Q", CAG => "Q",
CGT => "R", CGC => "R", CGA => "R", CGG => "R",
ATT => "I", ATC => "I", ATA => "I", ATG => "M",
ACT => "T", ACC => "T", ACA => "T", ACG => "T",
AAT => "N", AAC => "N", AAA => "K", AAG => "K",
AGT => "S", AGC => "S", AGA => "R", AGG => "R",
GTT => "V", GTC => "V", GTA => "V", GTG => "V",
GCT => "A", GCC => "A", GCA => "A", GCG => "A",
GAT => "D", GAC => "D", GAA => "E", GAG => "E",
GGT => "G", GGC => "G", GGA => "G", GGG => "G",
); # this is the hash table for the amino acids
my $compDNA = uc do { local $/; <> };
$compDNA =~ s/\s+//g;
my @codons = unpack '(A3)*', $compDNA;
my @aminoAcids = map { exists $aacode{$_} ? $aacode{$_} : "?$_?" } @codons;
print join '', @aminoAcids;
用法:perl script.pl compDNA_File [>aminoAcid_File]
最后一个可选参数将输出定向到文件。
首先,将整个文件篡改(并转换为全部大写)为变量。接下来,删除所有空格。 unpack用于创建三个字符元素(密码子)的列表。 map
用于使用您提供的哈希将密码子翻译成氨基酸。 (注意,如果密码子没有密钥,则插入密码子,用问号括起来。)最后,那些氨基酸join
形成一个单独的字符串,结果是{{1} }编
答案 1 :(得分:0)
你不想放
print OUTFILE3 $protein;
在你的for循环中,你打印出你正在处理的每一个protien,而不是你的for循环结束后你离开的最后一个,就像这样?
for ($count = 0; $count<scalar@array; $count= $count + 3) {
$codon = $codon.$array[$count].$array[$count+1].$array[$count+2];
$aminoacid = $aacode{$codon};
print OUTFILE3 $aminoacid;
} # tells how to read the codon and execute the hash table
答案 2 :(得分:0)
尝试以scriptname < sumaira2.out >> sumaira3.out
执行下面的脚本
如果$DEBUG
按预期工作,则将#!/usr/bin/perl
use strict; use warnings;
my $DEBUG = 2;
my %aacode = (
TTT => "F", TTC => "F", TTA => "L", TTG => "L",
TCT => "S", TCC => "S", TCA => "S", TCG => "S",
TAT => "Y", TAC => "Y", TAA => "STOP", TAG => "STOP",
TGT => "C", TGC => "C", TGA => "STOP", TGG => "W",
CTT => "L", CTC => "L", CTA => "L", CTG => "L",
CCT => "P", CCC => "P", CCA => "P", CCG => "P",
CAT => "H", CAC => "H", CAA => "Q", CAG => "Q",
CGT => "R", CGC => "R", CGA => "R", CGG => "R",
ATT => "I", ATC => "I", ATA => "I", ATG => "M",
ACT => "T", ACC => "T", ACA => "T", ACG => "T",
AAT => "N", AAC => "N", AAA => "K", AAG => "K",
AGT => "S", AGC => "S", AGA => "R", AGG => "R",
GTT => "V", GTC => "V", GTA => "V", GTG => "V",
GCT => "A", GCC => "A", GCA => "A", GCG => "A",
GAT => "D", GAC => "D", GAA => "E", GAG => "E",
GGT => "G", GGC => "G", GGA => "G", GGG => "G",
); # this is the hash table for the amino acids
my ($codon, $protein) = ('','');
while (<STDIN>){
chomp; # remove end of line characters
s/\s//g; # remove whitespaces
$codon .= $_;
}
print STDERR "DBG Codon: ", $codon, "\n" if $DEBUG >= 1;
my @aminoacids = ( $codon =~ /(...)/sg );
print STDERR "Aminoacids: ", join(" ", @aminoacids), "\n" if $DEBUG >= 2;
for my $aminoacid (@aminoacids) {
die "Unknown aminoacid: $aminoacid\n" unless exists $aacode{$aminoacid};
$protein .= $aacode{$aminoacid};
}
print STDERR "DBG Protein: ", $protein, "\n" if $DEBUG >= 1;
print $protein, "\n";
设置为零以删除调试输出。
{{1}}
答案 3 :(得分:0)
我强烈建议使用BioPerl来解决这些任务或其他一些库/工具包。原因是除了有3个阅读框外,还有16个密码子表。在我看来,人们已经在这个问题上花费了太多的精力(我也没有看到任何正确的解决方案),并且做一些超越平凡的事情将需要更多的工作和代码。以下是使用标准密码子表进行翻译的简单示例。
#!/usr/bin/env perl
use strict;
use warnings;
use Bio::SeqIO;
my $usage = "$0 nt.fasta";
my $file = shift or die $usage;
my $seqio = Bio::SeqIO->new(-file => $file);
my $seqobj = $seqio->next_seq; # create a Bio::Seq object
my $trans = $seqobj->translate; # call the translate method
# on the Bio::Seq object
print $trans->seq; # $trans is a Bio::Seq object,
# so we call the seq method to get the sequence
您可以对多个序列稍微修改一下,或者使用不同的密码子表。您还可以包含自定义密码子表。有关翻译序列的BioPerl HOWTO页面有一个很好的教程。
编辑:我尝试过的另外两个解决方案只能处理一个序列,但是我不会像我假设的那样解析Fasta格式。一个主要的实际考虑因素是你应该在你的翻译中插入一个符号(默认是BioPerl的星形,但你可以把它更改为你想要的任何一个)而不是单词“STOP”,因为它不会被任何其他工具识别。在视觉上也难以分辨。答案 4 :(得分:0)
好的,
所以我问我的教授,我的代码有多少问题。首先,我使用$ codon两次,同时希望它做两件不同的事情(我在while循环中使用了一次,在for循环中使用了一次)。所以它将整个infile视为$密码子,然后在它之后执行哈希表。第二件事是错误的(正如其他人之前提到的那样)是$ protein不在for循环中,因此只会给我最后一个氨基酸。无论如何,这是纠正的,有效的代码:
open (INFILE, "sumaira2.out");
open (OUTFILE3, ">sumaira3.out");
%aacode = (
TTT => "F", TTC => "F", TTA => "L", TTG => "L",
TCT => "S", TCC => "S", TCA => "S", TCG => "S",
TAT => "Y", TAC => "Y", TAA => "STOP", TAG => "STOP",
TGT => "C", TGC => "C", TGA => "STOP", TGG => "W",
CTT => "L", CTC => "L", CTA => "L", CTG => "L",
CCT => "P", CCC => "P", CCA => "P", CCG => "P",
CAT => "H", CAC => "H", CAA => "Q", CAG => "Q",
CGT => "R", CGC => "R", CGA => "R", CGG => "R",
ATT => "I", ATC => "I", ATA => "I", ATG => "M",
ACT => "T", ACC => "T", ACA => "T", ACG => "T",
AAT => "N", AAC => "N", AAA => "K", AAG => "K",
AGT => "S", AGC => "S", AGA => "R", AGG => "R",
GTT => "V", GTC => "V", GTA => "V", GTG => "V",
GCT => "A", GCC => "A", GCA => "A", GCG => "A",
GAT => "D", GAC => "D", GAA => "E", GAG => "E",
GGT => "G", GGC => "G", GGA => "G", GGG => "G",
); # this is the hash table for the amino acids
while ($line=<INFILE>){
$line =~ s/\s+$//;
$sequence = $sequence.$line;
@array = split "",$sequence;
} # splits all the characters in the text
for ($count = 0; $count<=scalar @array-3; $count= $count + 3) {
$codon = $array[$count].$array[$count+1].$array[$count+2];
$aminoacid = $aacode{$codon};
$protein = $protein.$aminoacid; #catenate the string
} # tells how to read the codon and execute the hash table
print OUTFILE3 $protein;
再次感谢大家的帮助,抱歉我花了这么长时间才回来!