请帮助改进以下代码。我无法在一行中打印序列。想要将输出打印成四行,每行具有四个字符之一的核苷酸频率。提前致谢。enter code here
#!/usr/bin/perl
use strict;
use warnings;
my $A;
my $T;
my $G;
my $C;
my $fileIN;
my $fileOUT;
my $seq ;
open ($fileIN, "basecount.nfasta") or die "can't open file ";
open ($fileOUT, ">basecount.out") or die "can't open file ";
while (<$fileIN>)
{
if ($_ =~/^>/) #ignore header line
{next;}
else
{
$seq = $_; #copy the all line with only nucleotide characters ATGC
}
$seq =~ s/\n//g; #create one single line containing all ATGC characters
print "$seq\n"; # verify previous step
my @dna = split ("",$seq); #create an array to include each nucleotide as array element
foreach my $element (@dna)
{
if ($element =~/A/) # match nucleotide pattern and countstrong text
{
$A++;
}
if ($element =~/T/)
{
$T++;
}
if ($element =~/G/)
{
$G++;
}
if ($element =~/C/)
{
$C++;
}
}
print $fileOUT "A=$A\n";
print $fileOUT "T=$T\n";
print $fileOUT "G=$G\n";
print $fileOUT "C=$C\n";
}
close ($fileIN);
close ($fileOUT);
答案 0 :(得分:1)
首先,我会使用一些快捷方式。它更容易阅读:
use strict;
use warnings;
use feature 'say';
my $A;
my $T;
my $G;
my $C;
my $fileIN;
my $fileOUT;
open $fileIN, '<',"basecount.nfasta" or die "can't open file basecount.nfasta for reading";
open $fileOUT, '>','basecount.out' or die "can't open file basecount.out for writing";
while ( my $seq = <$fileIN> ) {
next if $seq =~ /^>/;
$seq =~ s/\n//g;
say $seq;
my @dna = split //, $seq;
foreach my $element ( @dna ) {
$A++ if $element =~ m/A/;
$T++ if $element =~ m/T/;
$G++ if $element =~ m/G/;
$C++ if $element =~ m/C/;
}
say $fileOUT "A=$A";
say $fileOUT "T=$T";
say $fileOUT "G=$G";
say $fileOUT "C=$C";
}
close $fileIN;
close $fileOUT;
还建议使用3语句打开(以及良好的模具警告)。
编辑:
我在这里使用了use feature 'say'
,因为你的所有打印都以换行符结尾。 say
与print
完全相同,只是在最后添加换行符。