如何计算Perl中FASTA文件中的字符频率

时间:2014-03-23 04:13:33

标签: perl printing fasta

我试图从FASTA格式的文件中计算字符串中某些字符的百分比。所以文件看起来像这样;

>label
sequence
>label
sequence
>label
sequence

我试图从"序列"计算特定字符(例如G' s)的百分比。字符串。 在计算完之后(我能够做到),我试图打印一句话说:"(例如)标签1中G的百分比是(例如)53% "

所以我的问题是,如何对序列字符串进行计算,然后通过上面的标签在相应的输出中命名每个字符串?

到目前为止我的代码计算了百分比,但我无法识别它。

#!/usr/bin/perl 
use strict; 

# opens file
my $infile = "Lab1_seq.fasta.txt";
open INFILE, $infile or die "$infile: $!\n";

# reads each line
while (my $line = <INFILE>){ 
    chomp $line;

    #creates an array
    my @seq = split (/>/, $line);

    # Calculates percent
    if ($line !~ />/){
        my $G = ($line =~ tr/G//);
        my $C = ($line =~ tr/C//);
        my $total = $G + $C;
        my $length = length($line);
        my $percent = ($total / $length) * 100;

        #prints the percentage of G's and C's for label is x%
        print "The percentage of G's and C's for @seq[1] is $percent\n";
    }
    else{

    }
}

close INFILE

当我真的试图让它也说出与序列对应的每个标签的名称时,它会吐出这个输出(下面)

The percentage of G's and C's for  is 53.4868841970569
The percentage of G's and C's for  is 52.5443110348771
The percentage of G's and C's for  is 50.8746355685131

1 个答案:

答案 0 :(得分:1)

您只需匹配您的标签并将其保存在变量中:

my $label;

# reads each line
while (my $line = <INFILE>){ 
    ...

    if ($line =~ />(.*)/){
        $label = $1;

    # Calculates percent
    } else{
        ...
        print "The percentage of G's and C's for $label, @seq[1] is $percent\n";
    }
}