Question

我编写了一个脚本，该脚本使用子程序调用给定序列中的核苷酸百分比。当我运行脚本时，每个核苷酸百分比的输出总是显示为零。

这是我的代码;

#!/usr/bin/perl
use strict;
use warnings;

#### Subroutine to report percentage of each nucleotide in DNA sequence ####

my $input = $ARGV[0];
my $nt = $ARGV[1];
my $args = $#ARGV +1;

if($args != 2){
    print "Error!!! Insufficient number of arguments\n";
    print "Usage: $0 <input fasta file>\n";
}

my($FH, $line);

open($FH, '<', $input) || die "Could\'nt open file: $input\n";

$line = do{
    local $/;
    <$FH>;
};

$line =~ s/>(.*)//g;
$line =~ s/\s+//g;

my $perc = perc_nucleotide($line , $nt);
printf("The percentage of $nt nucleotide in given sequence is %.0f", $perc);
print "\n";


sub perc_nucleotide {
    my($line, $nt) = @_;
    print "$nt\n";
    my $count = 0;
    if( $nt eq "A" || $nt eq "T" || $nt eq "G" || $nt eq "C"){
    $count++;
    }
    my $total_len = length($line);
    my $perc = ($count/$total_len)*100;

}

我认为我将$count变量设置错误。我尝试了不同的方法，但无法弄明白。

这是输入文件

>XM_024894547.1 Trichoderma citrinoviride Redoxin (BBK36DRAFT_1163529), partial mRNA
ATGGCCTTCCGTCTCCCTCTGCGCCGCATTGCCCTGGCCCGCCCCGCCACCGTTGCGCGTGGCTTCCACT
CGACGCCCCGCGCCCTGGTCAAGGTCGGCGACGAGGTCCCGAGCTTGGAGCTGTTCGAGAAGTCGGCCGC
CAGCAAGATCAACCTGGCCGACGAGTTCAAGAAGGGCGACGGCTACATTGTCGGCGTCCCGGGCGCCTTC
TCCGGCACCTGCTCCGGCACCCACGTCCCGTCGTACATCAACCACCCTGACATCAAGACGGCCGGCCAGG
TCTTTGTCGTCTCCGTCAACGACCCCTTTGTCATGAAGGCTTGGGCAGACCAGCTGGATCCCGCCGGAGA
GACAGGAATCCGGTTCGTTGCCGACCCCACGGCTGAGTTCACAAAGGCTCTGGAACTGGGATTCGACGAC
GCTGCTCCTCTGTTCGGAGGCACCCGAAGCAAGCGCTATGCTCTCAAGGTTAAGGATGGCAAGGTCACTG
CCGCCTTTGTTGAGCCCGACAACACGGGCACTTCCGTGTCAATGGCCGACAAGGTCCTCAGCTAA

Answer 1

问题在于：

my $perc = perc_nucleotide($line , $nt);
printf("The percentage of $nt nucleotide in given sequence is %.0f", $perc);

perc_nucleotide正在返回0.18018018018018，但格式%.0f表示打印时没有小数位。所以它被截断为0.你应该使用更像%.2f的东西。

值得注意的是perc_nucleotide没有return。它仍然有效，但原因可能并不明显。

perc_nucleotide设置my $perc = ($count/$total_len)*100;但从不使用$perc。主程序中的$perc是一个不同的变量。

perc_nucleotide会返回一些内容，每个没有显式返回的Perl子例程都会返回“最后一次计算的表达式”。在这种情况下，它是my $perc = ($count/$total_len)*100;，但最后评估的表达式规则可能会有点棘手。

更容易阅读，更安全，有明确的回报。 return ($count/$total_len)*100;

Answer 2

我更正了剧本，它给了我正确的答案。

#!/usr/bin/perl
use strict;
use warnings;

##### Subroutine to calculate percentage of all nucleotides in a DNA sequence #####

my $input = $ARGV[0];
my $nt = $ARGV[1];
my $args = $#ARGV + 1;

if($args != 2){
    print "Error!!! Insufficient number of arguments\n";
    print "Usage: $0 <input_fasta_file> <nucleotide>\n";
}

my($FH, $line);

open($FH, '<', $input) || die "Couldn\'t open input file: $input\n";

$line = do{
    local $/;
    <$FH>;
};

chomp $line;

#print $line;

$line =~ s/>(.*)//g;
$line =~ s/\s+//g;

#print "$line\n";

my $total_len = length($line);
my $perc_of_nt = perc($line, $nt);

**printf("The percentage of nucleotide $nt in a given sequence is %.2f%%", $perc_of_nt);
print "\n";**


#print "$total_len\n";

sub perc{
    my($line, $nt) = @_;
    my $char; my $count = 0;
    **foreach $char (split //, $line){
    if($char eq $nt){
        $count += 1;
    }
    }** 
**return (($count/$total_len)*100)**
}

上述输入文件的答案是：

Total_len = 555
The percentage of nucleotide A in a given sequence is 18.02%
The percentage of nucleotide T in a given sequence is 18.74%
The percentage of nucleotide G in a given sequence is 28.47%

我所做的更改是粗体。

感谢您的惊人见解!!!

子程序的输出返回0

2 个答案: