嗨,我有一个文件,我想打开它,找到它的基因的起始和终止位置,我也有一些额外的信息。每个基因的开始按以下模式映射。存在8个字母的共识,称为Shine-Dalgarno序列(TAAGGAGG),然后在起始密码子(ATG)之前下游4-10个碱基。然而,存在Shine-Dalgarno序列的变体,其中最常见的是[TA] [AC] AGGA [GA] [GA]。基因的末端由终止密码子TAA,TAG和TGA指定。在正确的Open.Reading Frame(ORF)之后必须注意终止密码子。 现在我已经制作了一个带有基因组的txt文件,我用这个代码打开它,当我去阅读基因组并开始和结束时错误开始。任何帮助?非常感谢:
#!/usr/bin/perl -w
use strict;
use warnings;
# Searching for motifs
# Ask the user for the filename of the file containing
my $proteinfilename = "yersinia_genome.fasta";
print "\nYou open the filename of the protein sequence data: yersinia_genome.fasta \n";
# Remove the newline from the protein filename
chomp $proteinfilename;
# open the file, or exit
unless (open(PROTEINFILE, $proteinfilename) )
{
print "Cannot open file \"$proteinfilename\"\n\n";
exit;
}
# Read the protein sequence data from the file, and store it
# into the array variable @protein
my @protein = <PROTEINFILE>;
# Close the file - we've read all the data into @protein now.
close PROTEINFILE;
# Put the protein sequence data into a single string, as it's easier
# to search for a motif in a string than in an array of
# lines (what if the motif occurs over a line break?)
my $protein = join( '', @protein);
# Remove whitespace.
$protein =~ s/\s//g;
# In a loop, ask the user for a motif, search for the motif,
# and report if it was found.
my $motif='TAAGGAGG';
do
{
print "\n Your motif is:$motif\n";
# Remove the newline at the end of $motif
chomp $motif;
# Look for the motif
if ( $protein =~ /$motif/ )
{
print "I found it!This is the motif: $motif in line $.. \n\n";
}
else
{
print "I couldn't find it.\n\n";
}
}
until ($motif =~ /TAAGGAGG/g);
my $reverse=reverse $motif;
print "Here is the reverse Motif: $reverse. \n\n";
#HERE STARTS THE PROBLEMS,I DONT KNOW WHERE I MAKE THE MISTAKES
#$genome=$motif;
#$genome = $_[0];
my $ORF = 0;
while (my $genome = $proteinfilename) {
chomp $genome;
print "processing $genome\n";
my $mrna = split(/\s+/, $genome);
while ($mrna =~ /ATG/g) {
# $start and $stop are 0-based indexes
my $start = pos($mrna) - 3; # back up to include the start sequence
# discard remnant if no stop sequence can be found
last unless $mrna=~ /TAA|TAG|TGA/g;
#m/^ATG(?:[ATGC]{3}){8,}?(?:TAA|TAG|TGA)/gm;
my $stop = pos($mrna);
my $genlength = $stop - $start;
my $genome = substr($mrna, $start, $genlength);
print "\t" . join(' ', $start+1, $stop, $genome, $genlength) . "\n";
# $ORF ++;
#print "$ORF\n";
}
}
exit;
答案 0 :(得分:0)
while (my $genome = $proteinfilename) {
这会创建一个无限循环:你一遍又一遍地复制文件名(而不是$protein
数据)。
while
循环的目的不明确;它永远不会终止。
也许你的意思是
my ($genome) = $protein;
这是解决代码中明显问题的简单尝试。
#!/usr/bin/perl -w
use strict;
use warnings;
my $proteinfilename = "yersinia_genome.fasta";
chomp $proteinfilename;
unless (open(PROTEINFILE, $proteinfilename) )
{
# die, don't print & exit
die "Cannot open file \"$proteinfilename\"\n";
}
# Avoid creating a potentially large temporary array
# Read directly into $protein instead
my $protein = join ('', <PROTEINFILE>);
close PROTEINFILE;
$protein =~ s/\s//g;
# As this is a static variable, no point in looping
my $motif='TAAGGAGG';
chomp $motif;
if ( $protein =~ /$motif/ )
{
print "I found it! This is the motif: $motif in line $.. \n\n";
}
else
{
print "I couldn't find it.\n\n";
}
my $reverse=reverse $motif;
print "Here is the reverse Motif: $reverse. \n\n";
# $ORF isn't used; removed
# Again, no point in writing a loop
# Also, $genome is a copy of the data, not the filename
my $genome = $protein;
# It was already chomped, so no need to do that again
my $mrna = split(/\s+/, $genome);
while ($mrna =~ /ATG/g) {
my $start = pos($mrna) - 3; # back up to include the start sequence
last unless $mrna=~ /TAA|TAG|TGA/g;
my $stop = pos($mrna);
my $genlength = $stop - $start;
my $genome = substr($mrna, $start, $genlength);
print "\t" . join(' ', $start+1, $stop, $genome, $genlength) . "\n";
}
exit;
答案 1 :(得分:0)
谢谢,我已经解决了这个问题:
local $_=$protein;
while(/ATG/g){
my $start = pos()-3;
if(/T(?:TAA|TAG|TGA)/g){
my $stop = pos;
print $start, " " , $stop, " " ,$stop - $start, " " ,
substr ($_,$start,$stop - $start),$/;
}
}